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PHONETIC TRADING RELATIONS AND CONTEXT EFFECTS; 

NEW EXPERIMENTAL EVIDENCE FOR A SPEECH MODE OF PERCEPTION 1 



Bruno H. Repp 



Abstract . This article reviews a variety of experimental findings, 
most of them obtained in the last few years, that show that the 
perception of phonetic distinctions relies on a multiplicity of 
acoustic cces and is sensitive to the surrounding context in very 
specific ways. Nearly all of these effects have correspondences in 
speech production, and they are readily explained by the assumption 
that listeners make continuous use of their tacit knowledge of 
speech patterns* A general auditory theory that does not make 
reference to thfc specific origin a*" J function of speech can, at 
best, handle only a small portion of the wealth of phenomena 
reviewed here. Special emphasis is placed on several recent studies 
that obtained different patterns of results depending on whether 
identical stimuli were perceived as speech or as nonspeech. These 
findings provide strong empirical evidence for the existence of a 
special speech mode of perception. 



INTRODUCTION 

Speech is a specifically human capacity. Just as humans are uniquely 
enabled to produce the complex stream of sound called speech, one might 
suppose that they make use of special perceptual mechanisms to decode this 
complex signal. Of course, since speech is remarkably different from all 
other environmental sounds, it is highly likely that there are perceptual and 
cognitive processes that occur only when speech is the input. Otherwise, 
speech simply would not be perceived as what it is. To make sense, the 
question of whether speech perception is different from other forms of 
perception is best restricted to those aspects of speech that are not 
obviously unique, e.g., to its being an acoustic signal that can be described 
in the same physical terms as other environmental sounds. Then the question 
may be raised whether the perceptual translation of this acoustic signal into 
the sequence of discrete linguistic units that we experience (i.e., phonetic 
perception) requires the assumption of special mechanisms, or whether it can 
be reduced to a combination of auditory processes known to be involved also in 



*A revised version is to appear in Psychological Bulletin . 
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Ihf- per'"ptlon ,*nd int*T pretaU on >f nonspe^ch sounds* f_ven this modest 
que Hi on, h«iWv»T, presupposes that the linguistic categories applied by a 
listener, even though they are appropriate only for speech, are not unique in 
any essential sense but ratner can be viewed a3 labels appiied to specific 
auditory patterns. This assumption is probably wrong, but it must be granted 
now for tne argument to proceed. 

The precise nature of the processes and mechanisms that support phonetic 
perception has been the subject of much discussion, A number of speech 
researchers hold the view that speech perception i3 special in the sense that 
it takes account of the origin of the signal in the action of a speaker's 
articulatory system. This general view underlies the well-known motor theory 
of speech perception (e.g., Lloerman, Cooper, Shankwei ler , & St udder t-Ke?<nedy, 
196?) a3 well as the theory of analysis-by- synthesis (Halle & Stevens, 1959). 
Hore recently, it has been fertilized and augmented by ideas derived from 
jibson's (1966; theory of event perception (see, e.g., Bailey & Summerf leld , 
1980; Neisser. 1976; Dummerf ieid . 1979). which postulates that all perception 
is directed towards the source of stimulation. Whiie, in the Gibsonian 
framework, speech ? ^rception is not seen a3 basically different from the 
perception of other auditory (and visual) events, the fecial nature of the 
source (the human vocal tract) is acknowledged and emphasized. In this view, 
speech perception is special because the source of speech is special , There 
are other researchers, however, who would concur only with the second half of 
that statement (the special nature of the source), not with the first. They 
pursue the Hypothesis that the processes involved in speech perception are 
essentially the same a3 those that support the •-.auditory perception of 
nonspeech sounds, and that they operate without implicit reference to the 
3ound-producing mecnani3ms that generate the 3peech signal. In this view, tne 
specifi-; complexity of speech perception results merely from the diversity and 
the number of elementary auditory processes required to deal with an intn* 
cately structured signal (see, e.g., Divenyi, 1979; Kuhl 4 Miller, ^978\ 
Pastore, 1981; Schouten, 1980; Stevens, 1975). These two views are perhaps 
most clearly distinguished dv their different orientations to the evolution of 
speech perception; Whereas, according to the first view, special perceptual 
processes evolved hand in hand with articulatory capabilities to handle the 
complex output of a speaker's vocal tract, the second view assumes that the 
vocal productions of early hominid3 were fitted into a mold created by the 
pre-existing sensitivities and limitations of their auditory systems. 

Which of tnese two views is correct is, in part, an empirical question 
that rests on many possible sources of evidence, including the reactions to 
speech of animal and human infant subjects, traditional laboratory experi- 
ments, electrophysiological and clinical observations. In this review, I will 
focus on a set of recent attempts to demonstrate the peculiarities of speech 
perception in the laboratory, using normal adult human subjects. This kind of 
evidence nas been, and continues to be, central to the argument, as it is 
easier to -^tain, permits a variety of approaches, and is perhaps more readUy 
interpreted trtan some of the other research. This is not to deny that some of 
the most crucial results will come from infant and animal experiments; 
however, this research characteristically lags one step behind the standard 
laboratory findings, and studies that extend the latest findings on college 
students' perception to other subject populations 3re ju3t getting under way 
as tms review is being written. 



Less than a decade ago , a rich set of experimental data apparent! y 
supported the existence of a special speech mode of perception, distinct from 
other kinds of auditory perception. However, within a few years that support 
seems to have all but evaporated. The history of these events will be 
summarized and commented upon in the first par* of tht present paper. Since 
the main purpose of that section is to set the stage for the following review, 
mf* treatment of what are complex and often controversial issues will necessar- 
ily be sootwhat sketchy and betray my biases. Ir the second part, new 
evidence — much of it collected over the last few years--will be reviewed and 
discussed, I will conclude that we have, once again, strong experimental 
support for a special phonetic mode of perception. 



THE OLD EVIDENCE 

In a well-known paper, Wood (1975) listed six laboratory phenomena that, 
at that time, seemed to provide strong converging evidence for the existence 
of special processes in speech perception . One phenomenon is the "phoneme 
boundary effect," which is commonly ^sub^uroed under the more general term, 
categorical perception. It is the finding that two speech stimuli are easier 
to discriminate when they can be assigned to different linguistic categories 
thAi when, though separated by an equivalent physical difference, uhey are 
perceived as belonging to the same category. A second phenomenon is selective 
adaptation, the shift of the category boundary on a synthetic speech continuum 
following repeated presentation of one endpoint stimulus. Three other phe- 
nomena have to do with hemispheric specialization: the dichotic right-ear 
advantage, the right-ear advantage in temporal-order judgments of speech 
3tiTjuli, and differences In evoked potentials from the two hemispheres in 
re*$>dij|e to speech stimuli. A sixth phenomenon concerned asymmetric interfer- 
ence between auditory and phonetic stimulus dimensions in a speeded classifi- 
cation task. Many of the findings that Wood referred to under these headings 
have been excellently reviewed by Studdert-Kennedy (1976). 

At the time the Wood ami Studdert-Kennedy papers were written, all of the 
above-named phenomena seenua to be specific to speech; that is, they were 
apparently not obtained with nonspeech stimuli. However, a few years later, 
the picture had changed considerably. Using Wood's enuneration of findings as 
their starting point, both Cutting (1978) and Schouten (1980) reviewed more 
recent research using the various paradigms and concluded independently that 
there was no evidence for a special phonetic mode of perception. After that 
statement, the views of these two authors diverge: Cutting, a vigorous 
proponent of the Gibsonian view, ?rgues for considering speech perception as 
merely one instance of auditory event perception (i.e. , the perception of 
auditory events other than speech may be as— or nearly as — complex and special 
as speech perception), while b^ho^oen, who represents a more narrowly psycho- 
physical orientation, states rather bluntly that "speech and non-speech 
auditory stimuli are probably perceived in the same way" (p. 71), implying 
that all auditory perception rests on the same elementary processes. 

The conclusions of both authors reflect their disillusion over the 
failure of a number of experim ntal techniques to produce results specific to 
speech. Since the relevant evidence has been competently reviewed by them and 
by others, I will deal with it only briefly, focusing primarily on its 
interpretation. 
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Categorical Perc e ption 

The "phoneme boundary effect" singled out by Wood (1975) — the enhanced 
discriainability across the phonetic boundary on a synthetic speech continyun— 
is merely one aspect of the complex phenomenon termed categorical perception. 
Other aspects are reduced context sensitivity in stimulus categorization and 
predictability of discrimination performance from identification scores (Repp, 
Healy, & Crowder, 1979)- However, these latter two aspects have not been 
claimed to be specific to speech. 

The speech-specificity of the phoneme boundary effect has been challenged 
on the grounds that analogous effects hav* been demonstrated for a variety of 
nonspeech continua: noise-ouzz sequences (Miller, Wier, Pastore, Kelly, & 
Dooling, 1976), tone-onset-time (Pisoni, 19*7), tone aoplitude in the presence 
of a reference signal (Pastore, Ahroon, BaffMto, Friedman, Puleo, & Fink, 
1977), visual flicker (Pastore et al . , 1977), musical intervals (Burns & Ward, 
1976), and amplitude rise-time (Cutting & Rosner, 1974). The results for the 
r A se-time ("p) uck"-"bow") continuum, which have been widely cit^d and followed 
up, and on which Cutting (1978) rested his whole argument, have recently been 
claimed to be artifacts due to faulty stimulus construction (Rosen & Howell, 
1981), but the other findings appear to be solid. However, some of them are 
not very Surprising. If a psychophysical continuum is chosen on which some 
kind of thr*3hold is known to exist— such as the critical flicker fusion 
threshold— it is obvious that two stimuli from opposite sides of the threshold 
will be more discriminabH than two stimuli from the sane side. However, it 
does not follow that, therefore, the phoneme boundary effect on a speech 
continuum is also caused by a psychophysical boundary that happens to coincide 
with the pnoneme boundary. The problem is that, in most cases, we have no 
good idea of what the psychophysical boundary ought to be. Moreover, a 
phoneme boundary effect may be caused by the phoneme boundary itself, as 
argued below. There are several reasons why the nonspeech studies referred to 
above have done relatively little to clarify the issue. 

First of all only results obtained with nonspeech stimuli t^at have 
something An common with speech are directly relevant to the question of 
whether a specific phoneme boundary falls on top of a psychoacoustic 
threshold. For example, the observations on the ilicker fusion threshold 
(Pastore et ai., 1977) cannot have any direct implications for speech 
perception. They show that categorical perception can occur in the nonspeech 
dctnain, but they do not prove that the causes are the same as in a particular 
speech case. Second, just how much certain nonspeech stimuli have in common 
with speech stimuli they are intended to emulate is £ -natter of debate. It is 
doubtful\ for example, whether the relative onset time of two sinusoids 
(Pisoni, 1977) successfully simulates the distinction between a voiced and a 
voiceless stop consonant (of. Pastore, Harris, & Kaplan, 1981; Pisoni, 1980; 
Summerfield, in press), or whether amplitude rise-time has much to do with the 
fricative-affricate distinction (Remez, Cutting, & Studdert-Kennedy , 1980). 
Third, even those nonspeech continua (such as noise-buzz sequences) that 
appear to copy a speech cue more or less faithfully yield results that, on 
closer inspection, are not in agreement with speech results. For example, 
individual listeners in the Miller et al. (1976) study showed boundaries as 
short as 4 msec on a noise-buzz continuum, which is much shorter than any 
boundaries for English-speaking listeners, on the supposedly analogous voice- 
onset-time dimension (see, e.g., Zlatin, 1974). Note also that auditory 
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thresholds nay shift with extended practice in the laboratory, while 
linguistic boundaries ordinarily do not; this creates a problem for comparing 
the locations of the two. Fourth, andl aoat significantly, the various 
comparisons of categorical perception of speech and supposedly analogous 
nonspeecb stimuli generally 'have not taken into account the fact that there 
are multiple cues for each phonetic contrast and that perception of one cue, 
as it were, is not independent of the settings of other relevant cues. This 
issue, which has received particular attention only in the last few years, 
will be central to the second, part of the present paper. Fifth, there are a 
variety of other factors that influence. the locations of phonetic boundaries: 
language experience, speaking rate, stress, phonetic context, semantic 
factors, and so on. It ^remains to be shown 1 that psychophysical thresholds ar* 
sensitive to all, or even some, of these variables (or their psychoacoustic 
analogs). Finally, we note that there are examples of category boundary 
effects on nonspeech continue that have n% obvious psychophysical boundaries, 
viz., for musical intervals (Burns A Ward, 1978; Siegel & Siegel, 1977) or 
chords (Blechner, 1977; Zatorre & Halpern, 1979) • which suggests that, well- 
established categories of non-psychophysical origin may dominate perception. 

* In view of these arguments, one plausible account of the phoneme boundary 
effect regains that it arises from the use of category labels in 
discrimination. The best support for this hypothesis comes from studies that 
show a change in speech sound discrimtnability consequent upon a redefinition 
of linguistic categories for the same stimuli and the same listeners (e.g.. 
Garden, Levitt, Jusczyk, 4 Walley, 1981). However > the use of category* labels 
to discrimination is not unique to speech. The difference between speech and 
nonspeech in the discrimination pared igs probably rests on the nature of the 
categories: Phonetic categories are not only more deeply engrained than other 
categories, but they also bear a special relation to the acoustic signal. jAs 
Studdert-Kennedy (1976) has put it, speech sounds "name themselves. * 
Therefore, linguistic categories will dominate perception in a discrimination 
task to a larger* extent than nonspeech categories that irequently do not even 
exist pre- ex per Iran tally and, in those cases, merely serve to bisect the 
stimulus range. In addition, the acoustic distinctions underlying a category 
contrast may be finer in the case of speech and also are habitually ignored by 
listeners in a natural situation; therefore, they are more difficult to access 
in the cqntext of a discrimination task. 

The strongest evidence for the alternative hypothesis, that categorical 
perception of speech rests on nonlinguistic auditory discontinuities in 
perception, comes from research on human infants (for recent summaries, see 
Ju ,zyk, 1981; Horse, 1979; Walley, Pisoni, & Aslin, 1981) and nonhunan 
animals, particularly chinchillas (Kuhl, 1981; Kuhl & Miller, 1978). Allowing 
for the inevitable methodological differences and limitations, infants and (so 
far) chinchillas appear to perceive synthetic speech stimuli essentially the 
same way adults do, including superior discrimination of stimuli from 
different (adult) categories than of stimuli from the same category/ These 
effects obviously reflect some "natural" boundaries, but it is not entirely 
clear 1 whether these boundaries are strictly psychoacoustic in nature or 
whether they perhaps reflect some innate or acquired sensitivity to 
articulatory patterns. Even if they were psychoacoustic (this being the 
received interpretation of the infant and chinchilla findings), it. is not 
certain that linguistic categories in fact depend on them. (See, however, 
Aslin & Pisoni, 1980, for a different view.) For example, children in the 



early stages of language ur.e often are not able to make the perceptual 
distinction* infanta seem Ui 'be capable of (Barton, 1 980 ) . There are still 
many Open questions here* A fair assessment of the situation may be that the 
evidence on phonipe boundary effects neither strongly supports nor disconfirms 
the existence of a special speech mode of perception. 

Selective Adaptation 

The 1 shifting of phoneme boundaries on a continuum by repeated 
presentation of stimuli from one category has been a favorite pastime of some 
speech perception researchers ever since Eimas & Corbit (1973) discovered the 
technique. (See Diehl, 198h for a recent critical review.) In hindsight, 
this effort, seems not to have been worthwhile. Since various kinds of 
nonspeech. dimensions show selective-adaptation effects, it was to be expected 
that auditory dimensions of speech can be adapted as well. On the whole, this 
is what a score of studies snow.'. The technique was considered interesting 
because it was thought to reveal the existence of "phonetic feature detectors" 
(Eimas & Corbit, 1973). However, the evidence for specifically phonetic 
effects in selective adaptation is scant, and what there is can probably be 
explained as shifts in response criteria or as effects of remote nuditory 
similarity. Recent experiments by Sawusch and JusczyJc {1 981 ) and particularly 
by Roberts and SUmmerfield (1931) strongly suggest that there is no phonetic 
component in. selective adaptation at all, and that the effect takes place 
exclusively at a relatively early stage in auditory processing. 



The concept wf phonetic feature detectors /is useless not only for the 
explanation of selective adaptation results (cf\ Remez, 1979) but also from a 
wider theoretioal perspective. None expresses this better than ,uddert- 
Kennedy (in press) when he says, that "we are dealing with tautology, not 
explanation. ... The error lies' in offering to* explain phonetic capacity by 
making a substantive physiological mechanise out of a descriptive property of 
language" (p. 225). For, "... the perceived, -feature is an attribute, not a 
constituent, of the )<ercept f and we are absolved from positing specialized 
mechanises for its extraction 19 (p. 227). Arguments such as these apply not 
only to the concept of phonetic feature detectors but to the concept of the 
feature detector in gener 1. For these reasons, selective adaptation results 
cannot have any implications for or against the existence of a special speech 
mode. 

Hemispheric Specialization 

The empirical results supporting a hemispheric asymmetry for speech and 
language are rich and complex. While left-hemisphere advantages have been 
reported for certain kinds of nonspeech sounds, the evidence that^speech 
processes are lateralized to the left hemisphere in the large majority of 
individuals is unassailable. It has been claimed, however, that precisely 
because certain nonspeech stimuli show similar effects, the lateralization of 
speech should be explained by a more general principle, e.g., by a 
specialization of the left hemisphere for auditory properties characteristic 
of speech (Cutting, 1978; Schouten, 1980), or by an analytic-holistic 
distinction between the two hemispheres (e.g., Bradshaw 4 Nettleton, 1961). 
In commenting on the last-named paper, Studdert-Kennedy (1981) has argued that 
the analytic-holistic hypothesis, while descriptively adequate, is ill- 
conceived from a phylogenetic viewpoint. Rather, sinca lateralization 




presiaably evolved to support son* behavior Important to the spades, it seems 
more likely that lateralization of motor oontrol preceded or oaused 
lateralization of speeoh processes, which in turn may be responsible for the 
superior analytic capabilities of the left hemisphere. The apparent 
specialization of the left hemisphere for oertain auditory characteristics of 
speeoh may as veil be the oonsequenoe as the .cause of the lateralization 
of linguistic ^Tunptions. Thus, the existing evidence on hemispheric 
specialization can he interpreted in an alternative way that is more 
compatible with a biological viewpoint and that reoognizes the speoial status 
of speech. 

Other Laboratory Phenomena , 

Various other findings have been oited as evidence for or against a 
speeoh mode of perception. Thus, Wood (1975) mentions the phenomenon of 
asymmetric interference between auditory and linguistio dimensions in a 
speeded olassifioation task. While this finding (whose methodological details/ 
need not oonoern us here) may reveal something about the auditory processing 
of speeoh, its implications for the existence of a speoial speeoh mode of 
perception are limited. Similar patterns of results have been obtained with 
nonspeech auditory stimuli (Blechner, Day, & Cutting, 1976; Pastore, Ahroon, 
Puleo, Crimmins, Golowner, & Berger, 1976), suggesting that the asymmetry has 
a nonphonetlc basis. 

Schouten (1980) adds to Wood f s list two findings that seem to have even 
less bearing on the question of phonetic mode of perception; A difference 
in the stimulus duration needed for oorreot order judgments with sequences of 
speech or nonspeech sounds (Warren, Obusek, Farmer, & Warren, 1969), and aft* 
asymafetry in the perception of truncated CV and VC syllables (Pols & Schouten, 
1978). The first finding probably refleots the faot that speeoh stimuli are 
more readily categorized than nonspeech stimuli, while the second finding 
seems altogether irrelevant, having most likely a psyohoaooustic explanation. 
It is a mistake to believe (as Schouten apparently does) that the "case 
against a speech mode of perception 1 is strengthened by various findings of 
auditory (nonphonetlc) effeots in speech perception experiments. Such effects 
are likely to occur for, after all,- speeoh enters through the ears. The 
thesis of the present paper is,, however, that these effects ere relatively 
inconsequential for the linguistic processing of speeoh. 

By foousing pr vjly on ihe experimental paradigms listed in Wood's 
(1975) article, Cu> * (1978) and Schout** (1980) neglected a variety of 
other observations tfte* suggest the existence of a speeoh mode of perception. 
Liberman et al. (1967) reviewed many properties that are peculiar to speech 
and seem* to require speoial , perceptual skills. Foremost mnong these 
properties is the invarianoe of phonetic perception oyer substantial ohanges 
in the aooustio information; consider the well-known /di'-/du/ ex maple, which 
shows that the /d/ percept oan be cued by radically different transitions of 
the second formant. To achieve the same classification without reference to 
„ the articulatory gesture oemmon to /di/ and /du/, an exceedingly complex 
" auditory decoder" would be required. 

Liberman et al. (1967) also noted that the formant transitions 
distinguishing /di/ and /du/ sound quite different from eaoh other when they 
are presented in isolation and do not engage the speech mode. In faot, when 



second- or third- formant transitions are removed from a synthetic syllable and 
presented to one ear while the rest of the speech pattern is presented to the 
other ear, the transitions are found to do double duty: They are perceived as 
whistles or chirps in one ear, but they also fuse with the remainder of the 
syllable in the other ear to produce a percept equivalent to the original 
^syllable (Rand, 197M; Cutting, 1976). This "duplex perception" demonstrates 
the simultaneous use of speech and nonspeech modes of perception and has 
recently been further explored in experiments that will be reviewed later in 
this paper. 

Other authors have noted striking differences in subjects' responses 
depending on whether identical or similar stimuli were perceived as speech or 
nonspeech. For $&amp]&, House, Stevens, Sandel, and Arnold (1962) found that 
an ensemble of speech stimuli was easier to learn than various ^ensembles of 
speechlike stimuli that, however, werte not perceived as speech by the subjects 
(cf. also GrunRe & Pisoni, Note 1). Several studies of categorical perception 
have shown that speech stimuli from a synthetic continuum *re discriminated 
well across a phonetic category boundary, while nonspeech analogs or 
components of the same stimuli are discriminated poorly or at chance (e.g., 
Liberman, Harris, Eimas, Lisker, & Bastian, 1961; Liberman, Harris, Kinney, & 
Lane, 1961; Mattingly, Liberman, Syrdal, & Halwes, 1971). As long as two 
decades ago, House et al. (1962) concluded that "an understanding of speech 
perception cannot be achieved through experiments that study classical 
psychophysical responses to complex acoustic stimuli. ... Although speech 
stimuli are accepted by the peripheral auditory mechanism, their 
interpretation as linguistic events transfers their processing to some 
nonperipheral center where the detailed characteristics of the peripheral 
analysis are irrelevant" (p. 142). This conclusion is still valid, as the 
remainder of this paper will attempt to show. 

Summary 

Of the various paradigms reviewed by Cutting (1978) and Schouten (1980), 
some failed to support the existence of a speech mode of percept ^>n because 
they were irrelevant to begin with. As far as categorical perception and 
hemispheric specialization are concerned, some of the evidence may have been 
misinterpreted. The fact that categorical perception and left-hemisphere 
superiority can be obtained for certain nonspeech stimuli does away with 
earlier claims that these phenomena are speech-specific. However, it does not 
necessarily imply that similar patterns of results occur for ther*me reason 
in speech and nonspeech; and if they do, it is not necessarily' true that the 
. processes involved in the perception of nonspeech are more basic than, or the 
prerequisites for, those supporting speech perception. We have seen that 
there are other findings, not considered by Cutting and Schouten, that suggest 
that speech perception differs from nonspeech auditory perception. It must be 
acknowledged,* however, that the empirical results are complex, and while they 
hardly argue against the existence of a speech mode, they do not provide an 
overwhelming amount of positive evidence either. 

Certainly, the argument that speech perception is special would be 
strengthened if new, less controversial results could be brought to bear on 
the isspe. The second part of th^s paper focuses on a set of rather recent 
findings that add a new dimension to the argument. Since these results are 

recent and have not been reviewed previously, they will be treated in more 
# 
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detail. They may be grouped into three categories: phonetic trading 
relations, context effects, and other perceptual integration phenomena. What 
is common to all of them is that they deal with integration (over frequency, 
time, or space) in phonetic perception. 



THE NEW EVIDENCE 

The Piatt ,n Between Trading Relations and Context Effects 

It is known -any previous studies that virtually every phonetic 

contrast' is cued b;, - _ral distinct acoustic properties of the speech signal. 
It follows that, within limits set by the relative perceptual weights and by 
the ranges of effectiveness of these cues, a change in the setting of one cue 
M which, by itself, would have lej to a change in the phonetic percept) can be 
offset by an opposed change in the setting of another cue so as to maintain 
the original phonetic percept. This is a phonetic trading relation. 
According to Fitch, Halwes, Erickson, & Liberman (1980), there is a phonetic 
equivalence between two cues that trade with each other. I prefer to use this 
term in a slightly different vay, for neither cue is perceived in isolation; 
rather, they are perceived together and integrated into a unitary phonetic 
percept. Therefore, the equivalence holds not so much between (a-b) units of 
Cue 1 and (c-d) units of Cue 2, but rather between the phonetic percept caused 
by setting a of Cue 1 and setting d of Cue 2 and the phonetic percept caused 
by setting b of Cue 1 and setting c of Cue 2. These two percepts are 
phonetically equivalent in the sense that they yield exactly the same 
distribution of identification responses and are difficult to discriminate 
(see below). 

Trading relations occur among different cues for the same phonetic 
contrast. However, -n the perception of a phonetic distinction is affected 
by preceding or following context that is not part of the set of direct cues 
for the distinction (as illustrated in the next paragraph), we speak of a 
context effect. The context may be "close," i.e., it may constitute portions 
of the same coherent speech signal; or it may be "remote," referring to the 
relation between separate stimuli in a sequence, or between a precursor and a 
test stimulus. (Of course, the distinction between close and remote context 
is, to some extent, arbitrary.) Effects of close context, which are of special 
interest to us, are similar to trading relations in that they can be cancelled 
by an appropriate change in one or another cue relevant to the critical 
phonetic distinction. Conversely, a trading relation could be described 
(inappropriately) as a context effect, with one cue (the context) affecting 
the perception of another (the target). Formally, t/ading relations and 
context effects are quite similar, but it is useful to distinguish them on 
theoretical grounds. The distinction is best illustrated with en example. 

Mann 3nd Repp (1980) presented listeners with fricative noises from a 
synthetic [J]-ts] continuum, Immediately followed by one of four periodic 
stimuli. The periodic stimuli derived from natural utterances of [s«0, 
tju]i and [au], from which tne fricative noise portion had hefn removed^ thus, 
they contained formant transitions appropriate for either [J^ or [s], and the 
identity of the vowel was either [a] or [u]. The results showed that, for a 
given ambiguous noise stimulus, listeners reported more instances of "s" when 
the following formant transitions were appropriate for [s^ rather than [J], 
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and they also reported more instances of H s" when [uj followed rather than 
[•0. The first effect is a trading relation, the second a context effect. 
The effect of formant transitions on perception of the [J]-[s] distinction is 
a trading relation because the transitions are a cue to fricative place of 
articulation. They are also a direct consequence of fricative production, and 
this is obviously the reason why they are a cue to fricative perception. Mote 
that the transitions are integrated with the fricative noise cue into a 
unitary phonetic percept; listeners do not perceive a noise plus transitions, 
or a fricative consonant followed by a stop, consonant, although a stop would 
be perceived if the fricative noise were removed or silence were inserted 
between it and the periodic portion (Cole & Scott, 1973; Mann & Repp, 1980). 
The effect of vowel identity on fricative perception is different. Whether 
the vowel is [a] or [u] is not a consequence of fricative production, and 
vowel quality therefore does not constitute a direct cue for fricative 
perception. The vowel is not perceptually integrated with the noise cue— it 
remains audible as a separate phonetic segment. It is appropriate here to say 
that the perceived vowel quality modifies the perception or interpretation of 



• the fricative cues. This is a context effect, as distinct from a \ trading 
relation. 1 \ 



As we will see below, trading relations and context effects have distinct 
(though related) explanations in a theory of phonetic perception, and it is 
that theoretical view that underlies the distinction in the first place. 
However, before we turn to the issue of explanation, a brief review of 
empirical findings shall be presented. 



The fact that there are multiple cues for most phonetic contrasts has 
beert known for a long time. Much of this early knowledge derives from the 
extensive explorations at Haskins Laboratories since the late 1940s. For 
exafaple, Delattre, Liberman, Cooper, and Gerstman (1952) showed that the first 
two formants are important cues to vowel quality; Harris, Hoffman, Liberman, 
Delattre, and Cooper '1958) demonstrated that both second- and third-formant 
transitions contribute to the place-of-articulation distinction in stop 
consonants; and Gerstman (1957) found that both frication duration and rise- 
time are relevant to the fricative-affricate distinction. Lisker (1978b) f 
drawing on observations collected over a number of years, listed nc less than 
16 distinguishable cues to the /b/-/p/ distinction in intervocalic position. 

From these and many other studies, a nearly complete list of cues has 
been accumulated over the years. However, the data were typically collected 
by varying one cue at a time, although there are some exceptions, such as 
Hoffman's (1958) heroic study, which varied three cues to stop place of 
articulation simultaneously. Restrictions on the size ,of stimulus ensembles 
were imposed by the limited technology of the time, which made stimulus 
synthesis and test randomization very cumbersome. With the advent of modern 
computer-controlled synthesis and randomization routines, however, orthogonal 
variation of several cues in a single experiment became an easy task, and t the 
limit to the number of stimuli wa«* set by the patience of the listener rather 
than that of the investigator. The new technology led to a resurgence of 
interest in the way in which multiple cues cooperate in signalling a phonetic 
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distinction. Since, for one reason or another, many of the early Haskins 
studies had remained unpublished, certain results xhat had been known for 
years by word of mouth or from preliminary reports jynly recently found their 
way into the literature, after having been rtrfuicated with contemporary 
methods. 

A word is in order about the definition of cues. The traditional 
approach, exemplified especially by the Haskins work (including my own), has 
been to dissect a spectrographic representation of the speech signal, 
following essentially visual Gestalt principles. A cue, then, is a portion of 
the signal that can be isolated visually, that can be manipulated 
independently in. a speech synthesizer constructed for that purpose, and that 
can be shown to have some perceptual effect. This way of defining cues has 
been challenged on two grounds: (1) The spectrogram is not the only, and not 
necessarily the best, representation of the speech signal. For example, the 
well-known work of Stevens and Blumstein (1978; Blunstein & Stevens, 1979, 
1980) pursues the hypothesis that the shape of the total short-term spectrin 
at certain critical points in the signal constitutes a perceptual cue; thus, 
the individual formants and adjacent noise bursts are not treated as separate 
cues. Such * redefinition of cues is justified as long as it does not bypass 
the legitimate empirical issue of whether the elementary, spectrographically 
defined signal components are indeed integrated by the auditory system in this 
way (as they may be in the case of individual formants, but probably not in 
the case of other, more disparate types of cues). However, while definitions 
of such complex cues effectively combine information on one dimension (e.g., 
in the spectral domain) , -they typically sacrifice information on other 
dimensions (e.g., in the temporal domain). Thus, the onset spectra examined 
\^ Stevens and Blunstein are static and do not easily permit the description 
of .dynamic change over time. The issue revolves, in large part, around the 
question how the perceptually salient information in the signal is best 
characterized — a question that, of course, lies at the heart of the present 
paper as well. The essential problem is that the totality of the cues for a 
given phonetic contrast apparently cannot be captured in a fully* integrated 
fashion as long as purely physical (rather than articulatory or linguistic) 
terms are used. 2 (2) Another criticism of a more far-reaching sort denies 
altogether the usefulness of fractionating the speech signal into cues (see, 
e.g., Bailey & Summer field, 1980). This view, which rests on the precepts of 
Gibsonian theory (Gibson, 1966), will be taken up in the concluding comments 
of this paper. 

I will not attempt to review in detail all recent studies of phonetic 
trading relations, of which there are quite a few, A brief and selective 
overview shall suffice. Most studies had the purpose of clarifying the roles 
and surveying the effectiveness of different cues to various phonetic 
distinctions. Some studies that depart from this standard pattern will be 
considered later in more detail. Whereas the large majority of studies have 
used synthetic speech, some obtained similar information by cross-splicing 
components of natural utterances, or by combining such components with 
synthetic stimulus portions. Not all authors describe their findings as 
trading relations (a term used primarily by the Haskins group), but such 
relations are implied by the pattern of results. 

Voicing cues . Many studies have investigated multiple cues to the 
voiced-voiceles3 distinction. For stop consonants in initial position, both 
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voice onset time (VOT) and the first-formant (F1) transition contribute to the 
distinction (Stevens k Klatt, 1974; Lisker, Liberman, Erickeon, Dechovitz, & 
Handler, 1977). The critical feature of the F1 transition, which can be 
traded against VOT, is its onset frequency: If the onset frequency is lowered 
in a phonetically ambiguous stimulus, the VOT must be increased for a 
phonetically equivalent percept to obtain (Lisker, 1975; Summerfield & 
Haggard, 1977). Another cue that can be traded for VOT is the aplitude of 
the aspiration noise preceding the onset of voicing: If the amplitude of the 
noise is increased, its duration (i.e., *he VOT) must be decreased to maintain 
phonetic equivalence (Repp, 1979). The fundamental frequency (FO) at the 
onset of the voiced stimulus portion is another relevant cue (Haggarc, Ambler, 
4 Callow, 1970) that presumably can be traded against VOT (see Repp, 1976, 
1978b) . 

For stop consonants in intervocalic position, Lisker (1978b) has 
catalogued all the different aspects of the acoustic signal that contribute to 
the voicing distinction. They include the duration and offset characteristics 
of the preceding vocalic portion, the duration of the closure interval, the 
amplitude of voicing during the closure, and the onset characteristics of the 
following vocalic portion. Lisker' s catalogue is based on a lafge nunber of 
studies, not all of which have been published; however, 4ee Lisker (1957, 
1978a, 1978c), Lisker and Price (1979), Price and Lisker (1979). Trading 
relations between voicing cues for intervocalic stops have also been studied 
in French (Serniclaes, 1974, Notes 2 4 3). and in German (Kohler\ 1979). 

The voicing distinction for stop consonants in final position has also 
been intensively studied. Here, the duration of the vocalic portion is 
important (especially if no release burst is present) as well as its offset 
characteristics, the properties of the release burst, and the duration of the 
preceding closure. Trading relations among these cues have been investigated 
by Raphael (197*°, 1981), Wolf (1978), and Hogan and Rozsypal (1980), among 
others. 

The voicing distinction for fricatives in initial position has been 
studied by Massaro and Cohen (1976, 1977) who focused on the trading relation 
between fricative noise duration and FO at the onset of periodicity. In a 
Similar, fashion, Derr and Massaro (1980) and Soli (in press) studied the 
trading relations among duration of the periodic ("vowel") portion, duration 
of fricative noise, at. * FO as cues to fricative voicing in utterance-final 
position. Earlier studies of these cue* include Denes (1955) and Raphael 
(1972). 

Place of articulation cues . Trading relations among place of articula- 
tion cues for stop consonants in initial position— F2 and F3 transitions, 
burst frequency and burst amplitude—were studied long ago l?y Harris et al. 
(1958) and Hofftoan (1958) • and more recently, by Dorman, Studdert-Kennedy, and 
Raphael (1977) and by Mattingly and Levitt (1980). For stop consonants in 
intervocalic position. Repp (1978a) fpund a trading relation between the 
formant transitions in and ouc of the closure, yid Dorman and Raphael (1980) 
reported additional effects of closure duration and release burst frequency. 
Bailey and Summerfield (1980), in a series of painstaking experiments, 
investigated place cues for stops in fricative-stop-vowel syllables; these 
cues inclined the offset spectrin of the fricative noise, the duration of the 
closure period, and the formant frequencies at the onset of the vocalic 
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portion. Repp and Hann (1981a) recently demonstrated a trading relation 
between fricative noise offset spectrum and vocalic formant transitions in 
similar stimuli. Fricative noise spectrum and vocalic formant transitions as 
joint cues to fricative place of articulation were investigated by Whalen 
(1981). Mann and Repp (1980), and Carden et j1. (1981). 

Manner cues. Cues to stop manner of articulation (i.e., to presence 
vs. absence of a stop consonant) following a fricative and preceding a vowel 
were investigated by Bailey and Summerfield (1980), Fitch et al. (1980), and 
Best, Morrongiello, and Robson (1981). In each case, the trading relation 
studied W3S th*c between closure duration and formant onset frequencies in the 
vocalic portion. The two last-named studies will be discussed in more detail 
below. Summerfield, Bailey, Seton, and Dorman (1981) have shown that duration 
and amplitude contour of the fricative noise preceding the silent closure also 
contribute to the stop manner contrast. 

Several cues to the fricative-affricate distinction in initial position 
(rise-time, nqjse duration) were investigated by Gens^man (1957); see also van 
Heuven ( 1 979) In a more recent set of experiments, Ffepp, Liberman, Eccardt, 
and Pesetsky (1978) traded vocalic offset Spectrum, closure duration, and 
fricative noise duration as cues to a four-way distinction between vowel- 
fricative, vowel-stop-fricative , vowel-affricate , and vowel-stop-affricate . 
Trading relations among cues to the fricative-affricate distinction in final 
position were reported by Dorman, Raphael, and Libeman (1979: Exp. 5) and 
Dorman, Raphael, and Isenberg (1980!. 

Phonetic Equivalence 

It is obvious that, whenever two or more cues contribute to a given 
phonetic distinction, they can be traded against each other, within certain 
limits. What is not so obvious is that two stimuli with equal response 
distributions are truly equivalent in perception. Since most data on trading 
relations* were collected in identification tasks with a restricted set of 
response categories, subjects may have had no opportunity to report that 
certain stimuli sounded like neither of the alternatives. At a more subtle 
level, it may be the case that phonetically equivalent stimuli, even though 
they are labeled similarly, sound different in some way that subjects cannot 
easily explain in words. One way to assess this possibility is by means of a 
discrimination task. 3 

This was undertaken by Fitch et al. (1980) for the trading relation 
between silent closure duration and vocalic formant transition onsets as cues 
to stop manner in the "slit"-"split" distinction, and by Best et al. (1981) 
for the similar trading relation between silent closure duration and F1 
transition onset in the "say"-"stay" contrast. First, these authors deter- 
mined in an identification task how much silence was needed to compensate for 
a certain difference in formant onset frequency. Then they devised a 
discrimination task containing three different types of trials: On single-cue 
trials, the stimuli to be discriminated differed only in the spectral cue 
(formant onset frequency); they had the same setting of the temporal cue 
(silence). On coope ra tlng-cufc>> trials, the stimuli differed in both cues, 
such that the stimulus with the lower formant onsets (which fa /or "split" or 
"stay" percepts) also had the longer silence (which also favors "split" or 
"stay" percepts). On conflicting-cues trials, the stimuli again differed in 
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both cues j but now the stimulus with the lower formant onsets had tne shorter 
silence, sc that one cue favored "split" ("stay") and the other "slit" 
("say"). Since the silence difference chosen was the one found to compensate 
exactly for the spectral difference in the identification task, the stimuli in 
the conflicting-cues condition were (on the average) phonetfr^Uy equivalent. 4 

The results of these experiments showed a clear difference among the 
three conditions: Subjects 1 discrimination performance in the category boun- 
dary region was best in th^cpoperating-cues condition, worst in the conflict- 
ing-cues condition, and intermediate in the single-cue condition. Thus, it is 
true that (approximately) phonetically equivalent stimuli, nanely those in the 
conflicting-cues condition, are difficult to discriminate; they "sound the 
same," whereas stimuli in tb* cooperating-cues condition sound different, even 
though they exhibit the same physical differences on the two relevant 
dimensions. The pattern of discrimination results follows that predicted from 
identification data, showing that stimuli differing on two auditory dimensions 
simultaneously are still categorically perceived (given that perception is 
categorical when each of these dimensions, is varied separately). It is likely 
that listeners could be trained to become more sensitive to the physical 
differences that do exist between phonetically equivalent stimuli, and the 
interesting question arises whether discrimination on cooperating-cues trials 
would continue to be superior to that on conflicting-cues trials. So far, no 
study has taken this approach. However, preliminary results from £ related 
series of experiments (Repp, 1981b) indicate that some tradijng relations 
disappear when listeners try to discriminate pairs of stimuli ttfat unambigu- 
ously belong to the agpe* phonetic category (i.e., phonetically equivalent 
stimuli that are not from the boundary region), suggesting that these trading 
relations operate only when the stimuli are phonetically ambiguous. This 
leads us to the question of the origin of trading relations. 

Explanation of Trading Relations: Phonetic or Auditory? 

The large number of trading relations surveyed above poses formidable 
problems for anyone who would like to explain speech perception in purely 
auditory terms. Why should cues as diverse as, say, VOT and F1 'onset, or 
silence and fricative noise duration, trade in the way they do? Auditx>ry 
theory has only two avenues open: Either the cues are integrated into a 
unitary auditory percept at an early 'stage in peroeptun (the auditory 
integration hypothesis ) t or selective attention is directed to one of the cues 
(which then must be postulated to be the essential cue for the relevant 
phonetic contrast) f and the perception of that cue is affected by the settings 
of other cues (the au ditory interaction hypothesis ) . 

The auditory integration hypothesis is implicit in the work of Stevens 
snd Blunstem (1978; Blumstein & Stevens, 1979. 1980). To account for the 
fact that release burst spect, and formant transition onset frequencies are 
joint cues to place of articulation of syllable-initial stop consonants, 
Stevens and Blunstein assume thai the perceptually relevant variable is the 
integrated spectrum of the first 25 msec or so of a stimulus. In other words, 
the burst (which is usually shorter than 25 msec) and the onsets of the 
several formant transitions are considered an integral auditory variable. 
Since both cues are spectral in nature and occur within a short time period, 
this is not an unreasonable hypothesis, notwithstanding u*e different sources 
of excitation (noise vs. periodic) of the two sets of cues in voiced stops. 
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In fact, Ganong (1978) found support for the perceptual integrality of burst 
and tunnant transition cues in an ingenious experiment involving interaural 
transfer of selective-adaptation effects. However, Stevens and Bltmstein have 
had only limited success wiJth automatic classification of stop consonants 
according to onset upectrun alone, and Kewley-Port (1981) recently demonstrat- 
ed that automatic stop consonant identification can be impfoved by incorporat- 
ing a measure of spectral change, Thus f even though onset spectrin may be an 
important cue f it does not contain all the relevant information in the signal. 

The main problem with the auditory integration hypothesis seems to be 
that it applies only when the relevant cues are both spectral in nature, are 
of short duration, and occur simultaneously or in close succession. However, 
the cues are often spread out over a considerable stretch of time. For 
example, an explanation of the fact that both the formant transitions into and 
out of a stop closure contribute to the perceived place of articulation of a 
stop in medial position (Dorman & Raphael, 1980; Repp, 1978a; Repp & Mann, 
1981a) would require integration of spectra across a closure, i.e., over as 
much as 100 msec. Such a long integration period seems unlikely; certainly, 
it is much longer than that envisioned by Stevens and Bltmstein (1978). 
Trading relations that involve spectral and temporal cues (e.g., F1 onset and 
VOT for stop voicing in initial position) cannot be easily translated into 
purely spectral terms; and trading relations between purely temporal cues 
(e.g., silent closure duration and fricative noise duration for the fricative- 
affricate distinction in medial position) require a different explanation 
altogether. To be sure, there are some trading relations that do suggest 
auditory integration, such as that between VOT (i.e., aspiration noise 
duration) and aspiration noise amplitude (Repp, 1979), which is reminiscent of 
certain time-intensity reciprocities at the auditory threshold. In fact, 
preliminary data (Repp, 198 1 b) support this suggestion by showing that this 
trading relation operates independently of whether a listener is making 
phonetic or auditory judgments of speech stimuli. In other cakes, however, 
the cues that participate in a trading relation are simply too diverse or too 
widely spread out to make auditory integration seem plausible. Or, to put it 
somewhat differently, whereas any such traJing relation could be described as 
resulting from auditory integration, this integration would no longer seem to 
be motivated by general principles of auditory perception; thus, it would have 
to be considered a speech-specific process. 

The auditory interaction hypothesis, which postulates that trading rela- 
tions arise because perception of a primary cue is affected by other cues, has 
even less concrete evidence in its favor, in part because most of the relevant 
studies remain to be done. In particular, it is not clear whether auditory 
interactions (masking, contrast, etc.) of the kind and extent required to 
explain certain trading relations are at all plausible. For example, to 
explain the trading relation between VOT and F1 onset frequency as cues to 
stop consonant voting, it would have to be the case that a noise-filled t 
interval (VOT) sounds subjectively longer when followed by a periodic stimulus 
with a relatively low onset frequency. At present, there are no psychoacous- 
tic data to support this hypothesis. Auditory psychoph/sics involving non- 
speech stimuli of the degree of complexity of speech is still in its infancy 
(cf. Pastore, 1981). Perhaps, as more is learned about the perception of 
complex sounds and sound sequences, some auditory explanations of what now 
appear to be phonetic phenomena will be forthcoming. 5 One serious problem that 
has vexed researchers since the time of the early Haskins research is that of 
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finding appropriate nonspeech analogs for speech stimuli. If the analogs are 
too similar to speech, they may be perceived as speech and thereby cease 'to be 
good analogs and Become bad speech. If they are too different from speech, 
the generalizabilityN>f the findings to speech may be questioned. There is a 
way out of this dilemma: If stimuli could be constructed that are sufficient- 
ly ,iike speech to be perceived as speech by some listeners out not by others 
(perhaps prompted by different instructions), or even by the same listeners on 
different occasions, and if different results are obtained in the two 
conditions (e.g., two cues trade in one but not in the other), this would then 
be proof of specialized perceptual processes serving speech perception. 

It is from this perspective that a recent study by Best et al. (1981) 
receives special importance. These authors investigated the trading relation 
between silent closure duration and F1 transition onset frequency as cues to 
sto/p manner in the "say"-"stay w contrast. After replicating the results 
obtained wit 1 the similar ;, slit fl - n split n contrast by Fitch et al. (1980), 
thety proceeded to test for the presence of a similar trading relation in 
"sinewave analogs 1 * of the synthetic "say"-"stay n stimuli. Sinewave analogs 
ar4 obtained by imitating the formant trajectories of (voiced) speech stimuli 
wifh pure tones. Such analogs of simple CV syllables have been used 
previously by Cutting (197*0 and by Bailey, Summerfield, and Dorman (1977), 
whose work is discussed below; recently, Remez, Rubin, Pisoni, and Carrel! 
(1981) successfully synthesized whole English sentences in that way. The 
interesting thing about these' stimuli is that they are heard as nonspee^ 
whistles by the majority of naive listeners, but they may be heard as sjeech 
wt|en instructions point out their speechlikeness^ or spontaneously after 
prolonged listening. Once heard as speech, it is difficult (if not impossi- 
ble) to hear them as pure whistles again, although, the speech heard retains a 
highly artificial quality (Remez et al., 1981). Tt)>S phenomenon was exploited 
by B&st et al. in their main experiment.^ 
J 

They constructed sinewave analogs of a n say H -"stay w continuum by follow- 
ing a noise resembling [s]-frication with varying periods of silence and a 
sine-wave portion whose component tones imitated the first three formants of 
the periodic portion of the speech stimuli. There were two versions of the 
sinewave portion, one with a low onset of the tone simulating F1, and one with 
a high onset. (In speech stimuli, less silence is needed to change "say" to 
"stay" when F1 has a low onset than when it has a high onset.) The sinewave 
stimuli were presented to listeners in an AXB format, where the critical X 
stimulus had to be designated as being more similar to either the A or the B 
stimulus, which were analogs of a clear "say" (no silence, high F1 onset) and 
a clear "stay" (long silence, low F1 onset), respectively. Some of the 
subjects were told that the stimuli were intended to sound like "say w or 
"stay," whereas others were only told that the stimuli were computer sounds. 
After the experiment, the subjects were divided into those who reported that 
they heard the stimuli as n say n - n stay either spontaneously or after instruc- 
tions, and into those who reported various auditory impressions or inappropri- 
ate speech percepts. Only members of the first group, who— according to their 
self-reports — employed a phonetic mode of perception, showed a trading rela- 
tion between silence and F1 onset frequency, and this trading relation 
resembled chat obtained with synthetic speech stimuli . None of the other 
subjects showed this pattern of results. These other subjects could be 
further subdivided into two groups: those who reported that the stimuli 
differed in the amount of separation between the two stimulus portions (noise 
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and sinewaves) , and those who reported that the stimuli differed in the 
quality of the onset of the second portion ("water dripping," "thud," etc.). 
The AXB results substantiated these reports: The results of the first group 
indicated that the subjects paid attention only to the silence cue, whereas 
the second group seemed to make their judgments primarily on the basis of the 
spectral cue (F1-analog onset frequency). The response patterns of the two 
groups were radically different from each other, and both were different from 
the group who heard the stimuli as speech. It seems reasonable to conclude 
that the subjects x in the former two groups employed an auditory mode of 
perception. Being in this mode, they were unable to integrate the two cues 
into a unitary percept and instead focused on one or the other cue separately, 
thereby disccrvf inning the auditory integration hypothesis for this set of 
oues.6 There was some evidence of an auditory interaction in that those 
listeners who paid attention to the spectral cue were affected by the setting 
of the temporal cue. However, this effect was not sufficiently strong to 
account for the trading relation observed in speech-mode listeners;- moreover , 
those subjects who focused on the silence cue (which is the primary cue for 
stop manner) were not affected at all by tie setting of the spectral cue. 

f 

The results of Best et al. provide the strongest evidence we have so far 
that a trading relation is specific to phonetic perception: Vfhen listeners 
are not in the speech mode, the- trading relation disappears and selective 
attention to individual acoustic cues becomes possible. The data argue 
against any auditory explanation of the trading relation at hand, and they 
support the existence of a phonetic mode of perception that is characterized 
by specialized ways of stimulus processing. Results* from a recent study 
(Repp, 1981b) further confirm the phonetic nature of the trading relation 
between silence and F1 onset for the "say"-"stay" distinction by showing that 
it is obtained only in the phonetic boundary region of the speech continuum 
(i.e., when listeners can make a phonetic distinction) but not within the 
"stay" category (i.e., when listeners cannot make *a phonetic distinction and 
must rely on auditory criteria for discrimination). We may suspect that many 
other trading relations will behave similarly. This is already indicated for 
the trading relation between closure duration and fricative noise duration in 
the "say shop"-"say chop" distinction (Repp, 1981W and for that between 
fricative noise spectrum and formant transitions in the [j]-[sj distinction 
(Repp, 1981a, discussed in the next section). 

How, then, are trading relations to be explained, if not in terms of 
auditory interactions or integration? The proposed answer is this: Speech is 
produced by a vocal tract, and the production of a phonetic segment (assuming 
that such segments exist at some level in the articulatory plan) has complex 
and temporally distributed acoustic consequences. Therefore, the information 
supporting the perception of the same phonetic segment is acoustically diverse 
and spread out over time. The perceiver recovers the abstract units of speech 
by integrating the multiple cues that result from their production. The basis 
Cor that perceptual integration may be conceptualized in two ways. One is to 
state that listeners know from experience how a given phonetic segment "ought 
to sound, like" in a given context. Since phonetic contrasts almost always 
involve more than one acoustic property, trading relations, among these 
properties must result when the stimulus is ambiguous because, in this view, 
it is being valuated with reference to idealized representations or "proto- 
types" that differ on all these dimensions simultaneously: A change in one 

dimension can be offset by a change in another dimension, so that the 
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perceptual distances from the prototypes remain constant. The other possibil- 
ity is that perceptual integration does not require specific knowledge of 
speech patterns (whose form of memory storage is difficult to conceptualize) 
but is predicated directly upon the articulatory information in the signal. 
In other words, trading relations may occur because listeners perceive speech 
in terms of the underlying articulation, and inconsistencies in the acoustic 
information are resolved to yield perception of the most plausible articulato- 
ry act. This explanation thus requires that the listener have at least a 
general model of human vocal tracts and of their ways of action. The question 
remains: How much must an organism know about speech to exhibit a phonetic 
trading relation? An important issue for future research will be the question 
whether phonetic trading relations are obtained in human infants, and if not, 
how and when they begin to develop. 7 

Context Effects 

Effects Due to Immediate Phonetic Context 

Like phonetic trading relations, certain kinds of phonetic context 
effects have been known for a long time. The most familiar example is, 
per baps, the dependence of stop release burst perception on the following 
vowel, Liberman, Delattre, and Cooper (1952) showed that, when noise bursts 
of varying frequencies are followed by different steady-state periodic stimu- 
li, the stop consonant categories reported by listeners may depend on the 
quality of the vowel. For example, if a noise burst centered at 1600 Hz is 
followed by steady states appropriate for [i] or [u], listeners report "p," 
but if [a] follows, they report "k." 

A similar effect has been reported by Summerfield (1975) who found that 
the nature of the vowel influences the location of the boundary on a continuum 
of stop-consonant-vowel syllables varying in VOT. This oontext effect may 
actually be a trading relation because it probably reflects the influence of 
F1 onset (rather than vowel quality per se ) on the voicing decision, i.e., a 
trading relation between F1 onset and VOT (cf. Summerfield & Haggard, 1974, 
1977). Recently, Summerfield (in press) conducted an important series of 
experiments in which he tested whether this effect has an auditory basis* He 
used speech stimuli varying in VOT and in the F1 frequency of the following 
steady-state vocalic portion, and he compared their perception with that of 
two kinds of nonspeech analogs. One was a tone-onset-time (TOT) continuum 
(Pisoni, 1977) that varied the relative onset time of two pure tones of fixed 
frequency, matched in frequency and amplitude to the first two formants of the 
speech stimuli. The frequency of the lower tone was varied to simulate 
different F1 onset frequencies. The other set of nonspeech stimuli formed a 
noise-onset-time (NOT) continuum (cf. Miller et al. f 1976) that varied the 
lead time of a noise-excited steady-state F2 relative to a periodically 
excited steady-state F1. Different F1 onset frequencies were simulated by 
varying the frequency of F1. The stimuli were presented for identification as 
tigti or « k ti (speech) or as "simultaneous onset" vs. "successive onset" (non- 
speech). While the VOT boundary exhibited the expected sensitivity to F1 
onset frequency, neither nonspeech continuum evinced any reliable influence of 
FK-analoij) frequency on listeners 1 judgments. Pastore et al. (1981) recently 
reported a \similar failure to find equivalent effects of two different 
secondary variables (ris\ time and trailing stimuli) on VOT and TOT category 
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boundaries. These results suggest that the context effect obtained in speech 
does not have an auditory basis but is specific to the phonetic node. 
(However, see Footnote 7.) 

An effect of vocalic context on the perception of stop consonant place of 
\ articulation was investigated by Bailey et al. . (1977). These authors 

constructed two synthetic speech continue ranging from [b]+vowel to [d]+voml 
by varying the transition onset frequencies of F2 and F3. The two continua 
differed in the terminal (steady-state) frequency of F2 t which was high in one 
and low in the other. On each continuum, the transition onsets were arranged 
so that the center stimulus had completely flat F2 and F3, while both 
transitions rose in one endpoint stimulus to the same degree as they fell in 
the other endpoint stimulus. When these stimuli were presented to subjects 
for classification in an AXB task, it turned out that the category boundaries 
were at different locations on the two continua, neither being exactly in the 
center; one (on the continuum with the'low-F2 vowel) was displaced toward the 
Cd] end, while the other boundary was displaced toward the [b] end. Bailey et 

al. wished to test whether this difference (a kind of context effect, 

especially when "rising vs. falling transitions 1 * is considered the relevant 
cue, rather than absolute transition onset frequency, which varied with 
context) has a psychoacoustic basis. They pioneered in using sinewave analogs 
for that purpose. The* sinewave stimuli were presented in the same AXB 
paradigm to a group of subjects that was subdivided afterwards according to 
self-reports whether or not the stimuli were heard as speech. It turned owt 
that those listeners who claimed to hear [b] and td] had their category 
boundaries on the two continua at different locations that corresponded to 
those found with speech stimuli. The other listeners, however, who reported 
only nonspeech impressions, h*4 their boundaries close to the centers of both 
continua, as one might predict on psychophysical grounds. This experiment 
provided evidence that phonetic categorization is based on principles differ- 
ent from those of auditory psychophysics. Preempbly— although this was not 
shown directly by Bailey et al.— the asymmetrical boundaries obtained with 
speech stimuli were in accord with the acousU^al characteristics of typical 
stop consonants irt these particular vocalic comMffcs. * 

Let us turn now to other context effects that are of special interest 
because they involve segments not as obviously interdependent as stop conso- 
nants and following vowels. One effect concerns the influence of vocalic 
context on fricative perception. If a noise portion ambiguous between [J] and 
[s] is followed by a oeriodic portion appropriate for a rounded vopil such as 
Cu], listeners are more likely to report "a" than if the following vowel is 
unrounded, e.g., [a] (Kunisaki & Fujisaki, Note 5; Mann & Reppw 1980; Whalen, 
1981). A preceding vowel has a similar, but smaller effect (H^segawa, 1978). 
In addition to roundedness, oth?r features of the vowel (such as the front- 
back dimension) also seem to play a role (Whalen, 1981). Repp and Ham* 
(1981a) also discovered a small but reliable effect of a following stop 
consonant on fricative perception: Listeners are more likely to report *s" 
when the formant transitions in the following vocalic portion (separated from 
the noise by a silent closure interval) are appropriate for [k] than when they 
are appropriate for [t]. 

Several effects of context on the perception of stop consonants have been 
discovered in recent experiments, Mann and* Repp (1980) found that, in 
fricative-stop-vowel stimuli, listeners are more likely to report w k" when 
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vocalic stimuli with formant transitions ambiguous between it] and [k] are 
preceded by an [s]-noise plus silence than when they are preceded by an [J ]- 
noise plus silence. They showed that the effect has two components, one due 
to the spectral characteristics of the fricative noise (perhaps an auditory 
effect) and the other to the catetory label assigned to the fricative (which 
must be a phonetic effect). Subsequently, Repp and Mann ( 1 98la) showed the 
context effect to be independent r>f the effect of direct cues to stop place of 
articulation in the fricative noise offset spectra (which proves that it is a 
true context effect and not a trading relation), and they also ruled out 
simple response bias as a possible cause. In a further experiment, Mann 
(I960) found that, when stimuli ambiguous between [da] and Cga] were preceded 
by either [al] or [ar], listeners reported many more "g" percepts after [al] 
than after [ar] . In experiments with vowel-stop-stop-vowel stimuli , Repp 
(1978a, 1960a, 1980b) found various perceptual interdependences between the 
two stops cued by the formant transitions on either side of the closure 
.interval; In particular, perception of the first stop was influenced strongly 
by the second. 

How are ail these effects tp be explained? Auditory explanations would 
have to be formulated in the manner of the interaction hypothesis for trading 
relations: The perception of the relevant acoustic cues is somenow affected 
by the context. As in the case of trading relations, however, no plausible 
mechanisms that might mediate such effects have be|p suggested, and no similar 
effects with nonspeech analogs have been reported so far. On the other hand, 
reference to speech production provides a straightforward explanation of most, 
if not all, context effects. Just as trading relations reflect the dynamic 
nature of articulation (of a given phonetic segment), so are context effects 
accounted fo* by coartlculatlon (of different phonetic segments). The articu- 
latory movements characteristic of a given phonetic segment exhibit contextual 
variations that may be either part of the articulatory plan (allophonic 
variation, or anticipatory coarticulation) or due to the inertia of the 
articulators ( perseverative coarticulation). Presumably, human listeners pos- 
sess implicit knowledge of this coarticulatory variation. 

Coarticulatory effects corresponding to the perceptual phenomena just 
cited have been observed in most cases. Thus, it is well known that the 
release burst spectrin of stop consonants varies with the following vowel 
(Zue, Note 6) in a manner quite parallel to the perceptual findings of 
Liberman et al. (1952). Fricative noises exhibit a downward shift in 
spectrin* when they precede or follow a rounded vowel, due to anticipatory or 
carry-over Ho rounding XFujisaki & Kunisaki, 1978; Hasegawa, 1976; Mann & 
Repp, 1980), which explains the effect of vocalic context on fricative 
perception, The formant transitions of stop consonants vary with preceding 
fricatives (Repp & Mann, 1981a, 1981b) and liquids (Mann, 1980) in a manner 
consistent with the corresponding percepts! effects. Thus, the available 
evidence suggests that most perceptual context effects are parallelled by 
coarticulatory effects. The implication is, then, that listeners expect 
coarticulation to occur and compensate for its absence in experimental stimuli 
by shifting their response criteria accordingly. For example, if an [$]-like 
noise followed by [u] is not sufficiently lew on the spectral scale (as it 
should be because of anticipatory lip rounding), it might be perceived as an 
"s. n Thus, the evidence is hig.y persuasive that context effects, just like 
trading relations, reflect the listeners 1 intrinsic knowledge oT articulatory 
dynamics. 
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A critical test of the auditory vs. phonetic explanations of context 
effects can again be performed with appropriate nonspeech analogs, or with 
stimuli that can be perceived as either speech or nonspeech. Two such studies 
(Bailey et al., 1977; Summerfield, in press) were discussed above. In a 
recent experiment, I took an alternative approach (Repp, 1981a): Rather than 
using nonspeech stimuli that can be perceived as speech, I used speech stimuli 
(a portion of) which can be fairly readily perceived as nonspeech. Although 
it is usually difficult to abandon the phonetic mode when listening to speech, 
except in cases where the speech is strongly distorted or poorly synthesized, 
fricative-vowel syllables offer an opportunity to do so because they contain a 
sizable segment of fairly steady-state noise whose auditory properties 
("pitch," length, loudness) are relatively accessible. In my study, the 
fricative noise spectrum was varied along a continuum from [J]-like to [s]- 
llke, and the vowel was either [a] or [u]« It was known from earlier 
experiments (Mann & Repp, 1980) that listeners are more likely to label the 
fricative "s" in the contexv of [u] than in the context of [a]. A secondary 
cue to the [J]-[s] distinction was deliberately confounded with the context 
effect: The [a] vocalic portion contained formant transitions appropriate for 
[J], rnd the [u] portion contained transitions appropriate for [s]; this 
Increased the differential effect of the two vocalic contexts on "fricative 
identification. (Thus, this experiment tested a context effect and a trading 
relation at the same time.) The stimuli were subsequently presented in a same- 
different discrimination task wt*<?r*e the difference to be detected was in the 
spectrum of the noise portion, and the vowels were either the same or 
different, but irrelevant in any case. The majority of naive subjects 
perceived these stimuli fairly categorically: Their discrimination perfor- 
mance was poor; the pattern of responses suggested that they relied on 
category labels; and there were pronounced effects of vocalic context, just as 
in previous labeling tasks. Two subjects, however, performed much better than 
the others. Their data resembled those of three experienced listeners who 
also participated in the experiment. Comments and introspections of these 
subjects suggested that they were able to bypass or ignore phonetic categori- 
zation and to focus instead on the spectral properties (the "pitch") of the 
fricative noise. The crucial result was that these listeners not only 
performed much better than the rest (which supports the hypothesis that they 
employed an auditory mode of perception), but that they did not show any 
effect of vocalic context. These results were confirmed in a follow-up study 
where naive listeners were induced (with some success) to adopt an auditory 
listening strategy. Thase experiments demonstrate that vocalic context af- 
fected the perceived phonetic category of the fricative but not the perceived 
pitch quality of the noise. Therefore, the context effect due to the quality 
of the vowel, as well as the cue integration underlying the contribution of 
the vocalic formant transitions to fricative identification, must be phonetic 
in nature. 

Speaker Normalization Effects 

A phenomenon related to the context effects just discussed is that of 
speaker normalization. In an experimental demonstration of this effect, the 
perception of a critical phonetic segment is influenced, not by a phonetic 
change in an adjacent segment, but by an acoustic change such as might result 
from a change in speaker. For example, a (roughly proportional) upward shift 
of vowel formants on the frequency scale signifies that the speech signal 
originated in a smaller vocal tract. (How listeners "decide" that the same 
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vowel has been produced by a smaller vocal tract, rather than a different 
vowel by the ay,*e vocal tract, is an unresolved issue.) Such a change may 
influence the perception of phonetic segments in the vicinity, as long as the 
listener perceives the whole test utterance as coming from a single speaker's 
vocal tract. 

Although speaker normalization is a well-recognized problem in speech 
recognition research, there have been relatively few experimental stuttie*^ 
Rand (1971) constructed stop consonant continue ranging from /b/ to /d/ to /g/ 
by varying the onset of the F2 transition of three synthetic two-formant 
stimuli intended to represent, respectively, an fW produced by a large vocal 
tract, an Aft/ produced by a small vocal tract (differing from the former only 
in F2 frequency), and an /€/ produced by a large vocaJL tract (differing from 
the former only in F1 frequency). The results showed similar category 
boundaries (expressed in terms of absolute F2 onset frequency) for the two 
stimulus continue associated with large vocal tracts, but a shift towards 
higher frequencies on the continuum associated with a small vocal tract. Rand 
interpreted his findings as evidence for perceptual normalization, although 
this may not be the only possible explanation. 

In a more recent study, Hay (1976) followed fricative noises from a 
synthetic [J ]-[s] continuum with one of two synthetic periodic portions, 
intended to represent the same, towel produced^ by two differently-sized vocal 
tracts. The [J Ms] boundary shifted as expefeted: Listeners reported more 
"s" percepts in the context of the larger vocal tract. Subsequently, Mann and 
Repp (1980) conducted a similar experiment in which synthetic fricative noises 
were followed by vocalic portions derived from natural utterances produced by 
a male or a fefcale speaker. The results replicated those by Hay. These 
findings are consistent with the fact that smaller vocal tracts (f«ales) 
produce fricative noises of higher average' frequency than >large vocal tracts 
(males) (Schwartz, 1968). 

v To these results must be added the evidence from studies that have shown 
speaker normalization effects due to "remote" context, i.e., due to other 
stimuli in a sequence or to precursor stimuli or phrases (e.g., Ladefoged & 
Broadbent, 1957; Strange, Verbrugge, Shankweiler, 4 Edmdn, 1976; Summerfield & 
Haggard, 1975). They, all demonstrate the same point: Li stenersk interpret the 
speech signal in accordance with the perceived (or expected) dimensions of tne 
vocal tract that produced it. Information about vocal tract size is picked up 
in parallel with information about articulator movements; these are, respec- 
tively, the static and dynamic (or structural and functional) aspects of 
articulatory information. Speaker normalization effects are difficult to 
explain in terms of a general auditory theory that does not make reference to 
the mechanisms of speech production. Although some effects could, in princi- 
ple, result fro* auditory contrast, interactions of similar complexity have 
not yet been demonstrated in nonspeech contexts. 

Rate No rmalization Effects 

The somewhat. larger literature on perceptual effects of speaking rate has 
recently been thoroughly reviewed by Hiller (1981). Rate normalization, like 
speaker normalization, is a kind of context effect, and it can be produced by 
either close or remote context. Rate normalization is said to occur when the 
perception of a phonetic distinction signalled by a temporal cue (i.e., by the 
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duration of a stimulus portion, or by the rate of change in some acoustic 
parameter) is modified after a temporal change is introduced in portions of 
the context that are not themselves cues for the perception of the targe* 
segment* , * 

Only a few representative findings Shall be mentioned here. Miller and 
Liberman #>979) examined the stop-semivowel distinction (/ba/-/wa/) t cued by 
the duration and rate of the initial formant transitions, and found that the 
category boundary shifted systematically with the duration of the vocalic 
portion (i.e., of the whole stimulus). A corresponding shift of the discrimi- 
nation peak in an oddity Usk was reported by Miller (1980), This effect may 
have an auditory basis, for it has not only been found irv human infants (Eimas 
& Miller, 1980) Hut also with analogous nonspeech stimuli (Carrell, Pisoni, & 
Cans, Note 7). However, it may also be argued that simple, durational 
variation is not sufficient to create variations in perceived speaking rate. 

Fitch (1981) recently attempted to dissociate information about speaking 
rate from phonetically distinctive durational variation. The phonetic dis- 
tinction studied was that between [dabi] arid [dapi], as cued by the duration 
of the first stimulus portion ([dab] or [dap]). By manipulating the duration 
of natural utterances produced at different rates, she was able to show that 
speaking rate had a perceptual effect separate from that of physical duration. 
Thus, the information about speaking rate seems to be carried, in part, by 
more complex structural variables, such as the rate of spectral change in the 
signal. Soli (in press) has recently obtained similar results in a thorough 
investigation of cues to the [Jus]-[Juz] distinction. These findings are 
considerably more difficult to explain by psychoacoustic principles. 

The most convincing instances of rate normalization derive from studies 
that varied remote context. The perception of a variety of phonetic distinc- 
tions is sensitive to the perceived rate of articulation of a carrier sentence 
(e.g., Miller & Grosjean, 1981; Pickett & Decker, 1960; Summerfield, 1981). 
Miller and Grosjean (1981) showed that the articulation rate of the carrier 
sentence was more important than, its pause rate; even though the critical 
phonetic contrast ("rabid"-"rapid") was cued primarily by the perceived 
duration of a silent interval. Findings such as these suggest that speaking 
rate is a rather abstract property whose perception requires an appreciation 
of articulatory and linguistic variables (cf. alio Grosjean & Lahe, 1976). 
Summerfield (1981) has shown that the rate of a nonspeech carrier (a melody) 
does not affect speech perception, confirming that the listener's rate 
estimate must derive from speech to be relevant. 

These findings are just a sampling bf a much larger literature on 
perceptual adjustments for speaking rate (see Miller, 1981). Whether or not 
there are corresponding contextual effects in XRe judgment of auditory 
duration is not known (except for the above-cited study by Carrell et al., 
Note 7), although there is some plausibility in the hypothesis that the 
durations of adjacent or corresponding auditory intervals are judged relative 
to each other. Perhaps because this hypothesis seems more plausible than 
possible auditory explanations o** other context effects in speech, there have 
been few attempts so far to simulate speaking rate effects using nonspeech 
analog stimuli. However, there is some evidence that even simple durational 
changes may be interpreted differently in speech and nonspeech modes. Smith 
(1978) presented two identical syllables in succession and varied their 
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relative durations. Listeners had to judge either which syllable was more 
stressed (a linguistic judgment) or which syllable was longer in duration (an 
auditory judgment). The two kinds of judgment diverged: Stress judgments 
exhibited a tendency for the first syllable to be judged stressed, whereas 
duration judgments showed no such bias. These results indicate that the 
linguistic function of acoustic segment duration cannot be directly predicted 
from auditory judgments of that duration. "Presumably, in speech perception, 
acoustic segment duration is interpreted, as are all other, cues, within a 
framework of tacitly known articnlatory patterns and constraints, such as the 
well-known lengthening of a final syllable (Klatt : 1976). 

Sequential (Remote) Contex t Effects 

Context effects due to preceding and following stimuli in a test sequence 
are a ubiquitous phenomenon and well-known also in auditory psycho physics. 
They include effects of neighboring stimuli (preceding and/or following a 
target stimulus), as well as effects due to a whole series of preceding 
stimuli, referred to variously as selective adaptation, anchoring, range, or 
frequency effects . Even though these effects are clearly not in any way 
specific to speech— and speech stimuli are by no means immune to them, as was 
once believed witn regard to anchoring (Sawusch & Pisoni, 1973; Sawusch, 
Pisoni, 4 Cutting, 197*0 — the pattern of the data obtained for speech may 
nevertheless exhibit peculiarities not observed with nonspeech stimuli. The 
most striking of these is, of course, the relative stability of phonetic 
boundaries. Although all boundaries ^can be sffifted to some extent by 
contextual influences, most boundaries do fat change very much. (Isolated 
vowels are a significant exception— see belowy) Presumably, this is so because 
listeners have internal criteria based on trf^ir long experience with speech, 
and especially with their native tongue. It] might be argued that phonetic 
boundaries are stable because they coincide with auditory boundaries of some 
sort. However, the evidence for such a coincidence is not convincing (see my 
earlier discussion of categorical perception), and nonhuman subjects sefcffl to 
exhibit much larger range-contingent boundary shifts for speech stimuli than 
adult hunan subjects (Waters & Wilson, 1976). 

Another example of an interesting discrepancy between speech and non- 
speech is provided by the pattern of vowel context effects. Repp et al. 
(1979) found not only that isolated synthetic vowel stimuli presented in pairs 
exhibit large contextual effects (as shown earlier by Fry, Abramson, Eimas, & 
Liberman, 1962; Lindner, 1 966; Thompson & Hollien, 1970; and others), but also 
that backward contrast (the influence of the second stimulus on perception of 
the first) was stronger than forward contrast (the influence of the firsts 
stimulus on perception of the second) . These results become ' interesting In 
the light of later findings that nonspeech stimuli show (surprisingly) mueh 
smaller contrast effects than isolated vowels and no (or the opposite) 
difference between forward and backward contrast. Healy and Repp (in press) 
obtained these results by comparing vowels from an [i]-[I ] continuum with 
brief nonspeech "timbre" stimuli ( single- formant resonances of varying fre- 
quency, labeled as "low" or "high"). Fujisaki and Shigenc (1979) also 
compared vowels with timbre stimuli that, however, had the same duration, and 
still found a large difference in the magritude cf contrast effects, and 
larger backward than forward contrast for vowels only. Shigeno and Fujisaki 
(Note 8) compared phonetic category judgments of vowels varying in spectrum 
with pitch judgments of a single vowel varying in FO. While the former 
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condition replicated earlier findings (large contrast effects, more backward 
than forward contrast), there were no contrast effects at all in the latter 
condition. While it seems possible that an auditory explanation of these 
results will eventually be found, the peculiar flexibility of vowel perception 
may also be grounded in the special status of vowels as nuclear elements in 
the speech message. Perhaps, the modif lability of vowel perception corres- 
ponds to the remarkable contextual variability vowels exhibit in the speech 
signal. <? 

Other Perceptual Integration Effects 

3 discussion of evidence for a phonetic mode of perception would not be 
te without mention of two strands of research that make a particularly 
ant contribution. They both deal *ith the integration of cues separated 
not in time but in space or even occurring in different modalities. 

Duplex Perception 

Duplex perception is the newly coined (Liberman, 1979) name for 9 
phenomenon originally discovered by Rand (1974) and described earlier in thisi 
p«oer: An isolated formant transition presented to one ear simultaneously 
with the "base" (a synthetic CV syllable bereft of that formant transition) in 
the other ear is perceived as a lateralized nonspeech "chirp" although, at the 
same time, it contributes (presumably, by some process of central integration) 
t the perception of the syllable in the other ear.' The phenomenon by itself 
-s' 43 that the same input may be perceived in auditory ar d phonetic modes at 
the *si ae time: the transition is auditorily segregated, yet phonetically 
integrated with the base, Several recent studies show that various experimen- 
tal variables affect either the auditory or the phonetic part of the duplex 
percept, but not both. 

Thus, Isenberg and Liberman (1978) varied the intensity of the isolated 
transition. The subjects perceived changes in the loudness of the chirp, but 
they could not detect any change in the loudness of the syllable in the other 
ear, even though they perceived the phonetic segment specified by the 
transition. Liberman, Isenberg, and Rakerd (1981) immediately preceded the 
base with a fricative noise appropriate for [s], which (in the absence of any 
intervening silence) inhibited the perception of the stop consonant (tpl or 
tt]) that the base in conjunction with the transition in the other ear 
otherwise would have generated. Listeners found it difficult to discriminate 
[s]+[pa] and [s]+[ta] as long as they attended to the side on which the speech 
Was heard, for both stimuli sounded like [sa]. However, their discrimination 
of [p]-chirps from [t]-chirps in the other ear was highly accurate. Recently, 
Hann, Madden, Russell, and Liberman '1981) used the duplex perception paradigm 
to examine further the effect (discovered by Mann, 1980) of a preceding liquid 
on stop consonant perception. When the syllables [al] or [ar] preceded the 
base of a stimulus from a [ta]-tka] continuum, the context effect was obtained 
in phonetic perception (more [ka] percepts following [al]) while the percep- 
tion of the isolated transition in the other ear was unaltered. 

Effects similar to duplex perception have been reported, where some 
nonspeech stimulus in one ear affected phonetic perception in the other ear 
whi?.e retaining its nonspeech quality. For example, Pastore (1978) found 
that when the syllable [pa] in one ear was accompanied by a burst of noise in 
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the other ear, phonetic perception changed to [ta]. Apparently, the noise — 
even though It did not nave the appropriate timing, duration, and envelope — 
was interpreted by listeners as a ttl-release burst and' was integrated with 
the syllable In the other ear. There is no doubt, however, that listeners 
nevertheless continued to hear a nonspeech sound in the ear in which the noise 
occurred. The finding of Repp (1976) that the pitch of an isolated vowel in 
one ear affects the perception of the voiced-voiceless distinction for stop- 
consonant- vowel syllables in the other ear may be taken as another instance of 
dupjjx perception. Presumably, listeners cwuld have accurately judged the 
pitch of the isolated vowel without destroying its phonetic effect. 

Duplex perception phenomena provide evidence for the distinction between 
auditory and phonetic modes of perception. They show that the auditory mode 
can gain access to the input from individual ears while the phonetic mode, 
under certain conditions, operates on the combined input from both ears. The 
"phonological fusion* discovered by Day (1968)— two dichotic utterances |Mch 
as "banket" and "lanket" yield the percept "blanket" — is yet another example 
of the. abstract, nonauditory level of integration that characterizes the 
phonetio mode. 

Audio-Visual Integration 

Perhaps the most important recent discovery in .the field is the finding 
of an influence of visual articulatory information on phonetic perception 
(McGurk & MacDonald, 1976; MacDonald 4 McGurk, 1978; Summerfield, 1979). Of 
course, it has been known for a long time that lip reading aids speech 
perception, especially for the hard of hearing, but only recently has it 
become clear how tight audi -visual integration can be. McGurk and MacDonald 
(1976) presented a video display of a person's face saying simple CV syllables 
in synchrony with acoustic recordings of syllables from the same set. When 
the visual and auditory information disagreed, the visual information exerted 
a strong influence on the subjects* percepts, primarily due to the readily 
perceived presence vs. absence of visible lip closure. Thus, when a visual 
/da/ or /ga/ was paired with an auditory /ba/, subject* usually reported 
/da/. 8 

The interpretation of this finding is straightforward and of greao 
theoretical significance. Clearly, subjects somehow combine the articulatory 
information gained from the visual display with that gained from the acoustic 
signal. In Summerfield^ (1979) words, "optical and acoustic displays are co- 
perceived in a common metric closely related to that of articulatory dynarics" 
(p* 314). This phenomenon provides some of the strongest evidence we have for 
the existence of a speech-specific mode of perception that makes use of 
articulatory, as opposed to general auditory, information. The common metric 
of visual and auditory speech input represents a modality-independent, presum- 
ably articulation-based level of abstraction that is the likely site of the 
integration and context effects reviewed above. Phonetic perception in the 
auditory modality (when 3peech enters through the ears) is likely to be in 
every sense as abstract as it is in the visual modality (when articulatory 
movements are observed directly). 

In a recent ingenious study, Roberts and Summerfield (1981) jsed the 
audio-visual technique to demonstrate that selective adaptation of phonetic 
judgments is a purely auditory effect. Although conflicting visual informa- 
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tion changed the listeners 1 phonetic interpretation of an adapting stimulus, 
it had no effect whatsoever on the direction or magnitude of the adaptation 
effect* Besides its implications for the selective adaptation paradigm 
(cf. also Jfcwusch & Jusczyk, 1981), this elegant study provides further 
evidence for the autonomy of phonetic perception. 

Disruption of Perceptual Integration 

As was pointed out in the discussion of speaker normalization effects, a 
simulated change in vocal tract size (or in any other speaker characteristic, 
such as fundamental frequency) must not disrupt the perceptual coherence of an 
utterance if a normalization effect shall be observed. In the case of formant 
transitions leading into a vocalic stimulus portion, or of an aperiodic 
portion (fricative noise) being followed by a periodic portion, perceptual 
coherence is easily maintained when the formant frequencies of the vowel are 
changed. However, when two periodic sigrftl portions appropriate to different 
vocal tracts are juxtaposed, a change in speaker may be perceived, and this 
may lead to the disruption of whatever perceptual interactions (trading 
relations or context effects) may have taken place between the two periodic 
signal portions. There are several examples of this phenomenon in the recent 
literature. 

For example, Darwin and Bethell-Fox (1977) showed that, by changing 
fundamental frequency abruptly at points of transition, a speech stimulus 
originally perceived as a smooth alternation of a liquid consonant (or 
semivowel) and a vowel could be changed into a train of stop-vowel syllables 
perceived as being produced in alternation -by two different speakers. The 
manipulation of FO signalled a change in source and thus "split" the formant 
transitions into portions that effectively became new cues, signalling stop 
consonants rather than liquids or semivowels. 

Dorman et al. (1979: Exp. 6) studied a situation in which the percep- 
tion of a syllable-final stop consonant depends on whether or not there is a 
sufficient period of (near-)silence £o indicate closure. An utterance such as 
/babda/ is generally perceived as /bada/ if the stop closure interval is 
removed. Dorman et al, found, however, that when the first syllable, /bab/, 
is produced by a male speaker and the second syllable, /da/, by a female 
speaker, the syllable- final stop in /bab/ is clearly perceived. Because of 
the perceived change in speakers, listeners no longer recognize the absence of 
a closure interval; the critical syllable-final stop is now in utterance-final 
position. Interestingly, two subjects who repb^ted that they did not notice a 
change in speaker, also failed to perceive the syllable-final stop consonant 
in the absence of closure. 

Conversely, an interval of silence in an utterance may lose its perceptu- 
al value when a change of speaker is perceived to occur across it (Dorman et 
al., 1979: Exp. 7): When silence is inserted into the utterance "say shop" 
immediately preceding the fricative noise, listeners report "say chop". 
However, when "say" is spoken by a male voice and "shop" by a female voice, 
this effect no longer occurs; the silence loses its phonetic significance, and 
the second syllable remains "shop."* 

This effect was further investigated by Dechovitz, Rakerd, and Verbrugge 
(1980) who varied the perceived continuity of the test utterance "Let's go 
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shop (chop)" by having speakers produce either the whole phrase or just "Let's 
go." Silence inserted (or removed from) between the "go" and the "shop (chop)" 
of a continuous utterance had the expected effect on phonetic perception: 
"shop" was perceived as "chop" when silence was present, and "chop" was 
perceived as "shop" when there was no silence. However, when the "Let's go" 
with phrase-final intonation was followed by either "shop" or "chop" froqi a 
different production, there were no such effects? "shop (chop)" remained 
"shop (chop)." Interestingly, these authors found that a change of speaker 
fro© female to male between "Let's go" and "shop (chop)" did not disrupt 
perceptual integration as long as the "Let's go" derived from a continuous 
utterance of "Let's go shop (chop)." This finding is in apparent contradiction 
to that of Donnan et al. (1979) described in the preceding paragraph. 
Dechovitz et al, interpreted it as showing that dynamic information fcr 
utterance continuity may override a perceived change in source (despite the 
concomitant auditory discontinuities). If this interpretation incorrect, it 
may point to another instance where purely auditory principles fail to explain 
phonetic perceptipn. Some of the variables that determine the perceived 
continuity of an utterance are likely to be auditory (cf. Bregman, 1978); 
however, there may also be speech-specific factors that reflect what listeners 
consider plausible and possible in the dynamic context of natural utterances. 



CONCLUSIONS 

The findings reviewed above provide a wealth of results that, in large 
measure, cannot be accounted for by our current knowledge of auditory 
psychophysics. Although there remains much to be learned about the perception 
of complex auditory stimuli, some trading relations and context effects seem a 
priori unlikely to reflect an auditory level of interaction, and at iea3t 
one — audio-visual integration — simply cannot derive from that level. While 
efforts to delineate the role of general auditory processes ~* in speech 
perception should certainly continue, it may be predicted that this role will 
be restricted largely to the perception of nonphonetic stimulus attributes. 

This is not to say that auditory properties of the signal are not the 
basic carrier of the linguistic message. However, auditory psychophysics 
gains knowledge about the perception of 4 *iese properties in large part from 
listeners' judgments in psychophysical experiments, and these judgments are 
made in a different frame of reference from the judgments of speech. Aud ivory 
variables, but not, auditory judgments* are the basis of phonetic perception. 
Even those limitations imposed by the auditory system that have to do winh 
detectability and resolution may rot play any important role in phonetic 
distinctions. For instance, there is no reason why phonetic category boundar- 
ies could not be placed at suprathreshold auditory parameter settings tnat 
seem arbitrary from a psychophysical viewpoint but are well motivated by t^e 
articulatory and acoustic patterns that characterize a given language. And 
even though phonetic and auditory boundaries may sometimes coincide, tnere is 
the more fundamental question whether Such "boundaries" play any role in the 
perception of natural speech, cctasidering the fact that natural speech is 
different in a number of ways from the artificial stimuli employed in speech 
discrimination tasks. While the objection of ecologically invalid stimuli 
extends to most of the studies reviewed in uhis paper, the present emphasis 
has been on processes c r perceptual integration that promise to be more 
general than static concepts such as boundary locations, 
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Two possible criticisms of the research reviewed hero should be 
mentioned. One is that nearly all studies demonstrated perceptual integration 
in situations of high uncertainly produced by ambiguous settings of the 
primary cue(s) for a given phonetic distinction. The perceptual integration 
observed may have been motivated by that ambiguity. In that case, it may be 
that perceptual integration does net occtir to the same extent in natural 
situations, where the primary cues are often sufficient for accurate phonetic 
perception. 

The other criticism is that, although the trading relations and context 
effects reviewed here have been d^cribed as complex interactions between 
separate cues, it may well be that these cues do not function as perceptual 
entities that are "extracted" and then recombined into a unitary phonetic 
percept (cf. Bailey & Summerfield, 1980). In that view, cues serve only 
descriptive purposes; the perceptual interactions between them can be under- 
stood as resulting from the listeners 1 apprehension of the articulatory events 
they convey. While cues (i.e., acoustic segments) are indispensable for 
describing how the articulatcry information is represented in the signal, we 
need not postulate special perceptual processes that cdnstruct or derive the 
articulatory information from these elementary pieces. Rather, the articula- 
tory information may be said to be directly available (Gibson, 1966; Neisser, 
1976). This is an attractive proposal; however, we should, not forget that 
there are real questions to be answered about the mechanisms that accomplish 
phonetic perception and that we know so woefully little about at present. If 
cues and their interactions have no place in a description of these mechan- 
isms, we are faced with the more fundamental problem of finding the proper 
ingredients for a model of speech perception. 

There -is reason to believe that the information processing approaches 
currently in vogue are not likely to lead us very far in that regard. To 
understand how our p rceptual systems work, we need to understand how a 
complex biological system (our brain) integrates and differentiates informa- 
tion, how it is modified by experience, and how the structure of the input 
(i.e., the environment) gets to be represented in the system. These are 
complex biological questions whose solution will not come easily. Computer 
analogies are largely tautological and distract from the fundamental biologi- 
cal and philosophical problems that lie at the heart of the problem of 
perception (see, e.g., Hayek, 1952; Pisget, 1967; Studdert-Kennedy , in press, 
Note 9). In a particularly enlightening discussion, Fodor (Note 10) has 
recently argued for the modularity of the speech (and language) system, i.e., 
for its specificity and relative isolation from other perceptual and cognitive 
systems. He also pointed out that it is precisely such modular systems that 
we have some hope of understanding, whereas explanations of perception in 
terms of general principles remains interminably ad hoc. Thus, >*e should not 
be surprised to find that speech perception is accomplished by means entirely 
particular to that mode. The probien of how to investigate and describe those 
means will keep us busy for some time to come. 
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1 »A rule of thumb for distinguishing a trading relation from a context 
effect is that the phonetic equivalence resulting from a trading relation is 
strong in the sense that two phonetically equivalent stimuli (syllables or 
words) are difficult to tell apart (Fitch et al. t 1980), whereas the phonetic 
equivalence produced by trading a critical cue -gainst some contextual 
influence is restricted to the target segment, as it ^always invdlves a readily 
detectable change in one cr more contextual segments. To the extent ( that a 
change in context (e.g., vowel quality) also modifies critical cues' (e.g., 
formant transitions), context effects may sometimes Include disguised trading 
relations* 

fc The attempt to define integrated cues must be distinguished from 
independent efforts to represent the speech signal in a way that takes into 
account peripheral auditory transformations (Searle, Jacobson, & Rayment, 
1979; Zwicker, Terhardt, & Paulus, 1979). Such representations are, of 
course, very useful and may lead *o the redefinition of some cueff however, 
they do not, by themselves, solve the problea.of cue definition. 

^In essence, this kind of study investigates whether multidimensionally 
varying speech stimuli are perceived categorically. Traditional studies of 
categorical perception have been exclusively concerned with stimuli varying on 
a single dimension, or varying on .several dimensions in a perfectly correlated 
fashion. Note that, in these studies, physically different stimuli from the 
region of the category boundary are not phonetically equivalent-- they have 
different response distributions. As soon as two or more cues are varied, 
however, paira_4>f phonetically equivalent stj^uli can be found for any given 
response dis^tribb^ioii^^^^^Thus, the Influence of phonetic categorization on 
discrimination judgmentsNcan be factored out, at least in principle (see 
Footnote \ 

**To prodjhce precise (rataer than just average) phonetic equivalence, it 
would not omy be necessary jto take into account the fact that individual 
listeners shftw trading relations of varying magnitude but also that (covert) 
labeling responses may change in the context of a discrimination task (Repp et 
al«, 1979). VThus, the stimulus parameters would have to be adjusted separate- 
ly for each Ylistener , based on labeling data collected With the stimulus 
sequences of \the discrimination task. This procedure would optimize the 
opportunity to verify the prediction that stimuli in the conflicting-cues 
condition are more difficult to discriminate than those in the cooperating- 
cues conditio^, with the single-cue condition- in between. However, this order 
of difficulty I is likely to obtain also when the choices of parameters are less 
than optimal. \ 

^Most inter^tinglv^the only 'completed study (so far) of a trading 
relation in human^trtf^Tits >Wll^r & Eimas^ Note 4) has yielded a positive 
result: The boundary on a VOT^oo^tinuum/was significantly affected by the 
duration of the formant transit i^His. * variable that is confounded with F1 
onset frequency (cf. Summer field sT%4gar&^l7rH ^ Kuhl 4 Miller (1978) 
obtained a similar result with chinchillas. This trading relation, at least, 
appears to be of auditory origin, even though the principle involved is not 
yet clear. It seems likely, though, that not all trading relations will 
follow this, pattern. 
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That the subjects focused on one cue only was a strategy furthered by 
the AXB classification task of Best et al. In a different paradigm, the 
subjects may pay attention to both cues at the sue tiqie (cf. Repp, 1981b). 
The important point is that, in the auditory mode, the cues are not integrated 
into a unitary percept, so that listeners may choose between selective- 
attention and divided-attention strategies. 

7 In that connection, the study of Simon and Fourcin (1978) might be 
mentioned, which showed that the trading relation between VOT and F1 transi- 
tion trajectory as cues to stop consonant voicing emerged at age 4 in British 
children but was absent in 2- and 3-year olds. Recently, however, Miller & 
Eimas (Note 4) found a related trading relation (between VOT and transition 
duration) in American infants. This conflict needs to be resolved. 

Q 

°I have experienced this effeqit myself (together with a number of my 
colleagues at Haskins) and can copflrm that it is a true perceptual phenome- 
non, not some kind of inference *t>r bias in the face of conflicting informa- 
tion # The observer .really believes that he or she hears ^fat, in fact, he or 
she only sees on the screen; there is little or no awareness of anything odd 
happening. However, the effect is not always that strong; its presence and 
strength depend on the particular combination of syllables, in a way that can 
also, in part, be explained by reference to articulation. It is strongest 
when the visual information makes the auditory information impossible in 
articulatory terms. The details of the effect and of the relevant variables 
remain to be investigated. 

Q 

7 These experiments concern the disruption of perceptual integration of 
cues. However, context effects can presumably be similarly blocked by a 
change in apparent source. Diehl, Souther, and Oonvis (1980) recently 
reported a study in which a rate normalization effect (of a precursor on the 
/ga/-/ka/ distinction) was eliminated by a change of voice. Unfortunately, 
their data were not entirely consistent and call for replication. 
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TEMPORAL PATTERNS OF COARTICULATION: LIP ROUNDING 11 
Fredericka Bell-Bertie and Katherine S. Harris** 



Abstract . According to some theories, anticipatory coarticulation 
occurs when phones for which a feature is unspecified precede one 
for which the feature is specified, with consequent migration of the 
feature value to the antecedent phones. Carryover coarticulation, 
on the other hand, is often attributed to "articulatory sluggish- 
ness. * In this paper, EMG evidence is provided that this formulation 
is inadequate, since the beginning of EMG activity associated with 
vowel lip rounding is independent of measures of the acoustic 
duration of adjacent consonants* We suggest that the often noted 
vowel-rounding gesture simply co-occurs during predictable intervals 
with portions of preceding and following lingual consonant articula- 
tions* 



INTRODUCTION 

A central problem lu understanding the relationship between speech 
production and perception is the disparity between the perceptual representa- 
tion of speech as a series of discrete events, composed of partially 
commutable elements, and the acoustic representation as a continuously varying 
stream, without obvious phonetic segment markers. This acoustic stream is 
generated by the activity of the several articulators, whose activity is 
apparently continuous and context dependent* Many theories of coarticulation 
attempt to solve the problem of context sensitivity by positing some kind of 
speech synthesis process that occurs in production, and allows the fitting 
together of the discryte^ units into the continuous stream. The task of the 
theorist, then, is to write the adjustment rules. 
< 

In a widely cited theory of anticipatory coarticulation, Henke (1966) 
provides a fairly typical formulation. Each phone in an articulatory string 
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is conceived as composed of a bundle of articulator/ features. Anticipatory 
poarticulation occurs when phones for which a given feature is unspecified 
Precede one for which the feature is specified, with consequent subjection of 
th* antecedent phones to the feature value of the following phone. Since time 
is unspecified in the theory, the temporal duration occupied by the string of 
antecedent phones is presumably irrelevant; all will acquire the same feature 
value. 

It has been claimed by Fowler (1980) that all auch theories of coarticu- 
lation belong to the class of extrinsic timing models of speech production. 
Such models assume that the dimension of time is excluded from the specifica- 
tion of a phonological segment in the motor plan for the utterance* In 
Fowler 9 s view, such accounts must therefore necessarily fail to explain or 
predict coarticulation. While one may or may not accept her arguaent in its 
larger theoretical framework, we believe that purely substantive evidence can 
be marshaled against such phonological segment theories as a class. 

In an earlier report (Beil-Berti & Harris, T979) we Provided evidence 
that this formulation is inadequate, and have elsewhere suggested an alterna- 
tive hypothesis (Bell-Berti, 1980; Bell-Berti & Harrjjpwtf . Specifically, 
we found that if a rounded vowel was preceded Sy one or two consonants 
presumably unspecified for rounding, the electromyographic activity associated 
with rotroding began a constant time, rather i^an a constant - nunbef of 
segments, before the onset of the vowel. 

i 

The present experiment was designed to extend the earlier one in several 
ways. First , we have examined both anticipatory and carryover coarticulation 
of lip rounding. Often, "articulatory sluggishness" explanations are proposed 
for carryover coarticulation while "planning* explanations are proposed for 
anticipatory coarticulation (e.g., MacMeilage, 1970). However, if both anti- 
cipatory and carryover effects appear to be guided by the same articulatory 
rules\ disparate explanations for these two effects seem less plausible. 

Secondly, we' have examined the special case tn which coarticulation 
occurs from one vowel to another vowel, where both vowels are rowfeti and are 
separated by intervening consonants without rounding specification. In su^h 
cases, it has been shown that a "trough" will occur — that is, EMG activity * 
will be reduced at some point in the vowel -to- vowel period* This situation 
is, of course, not explicable by the type of model of coarticulation 
exemplified by Henke's, as we (Bell-Berti k Harris, I9?u) and others *e,g,, 
Gay, 1978) have pointed out. 

Thirdly, we extended the design of the experiment to include longer 
strings of consonants preceding or following the rounded vowel than the 
original maximum of *two-eleraent clusters. We also increased the subject pool, 
and included subjects nai 'e to the purposes of the experiment, 

Fourthly, we checked the subjects to see If orbicularis oris activity 
occurred for segment sequences for which no lip rounding was specified, In a 
theory like H^nk A, s, it is assumed that a feature, such as lip rounding, 
spreads from a phone for which it Is specified, to the preceding phones fe^ 
which it is not. If the preceding phones carry d specification for the 
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feature, the experiment provides no test of the theory* Earlier studies 
(Daniioff 4 Moil f 1968) have been criticized by later authors CBenguerel k 
Cowan, 197*1) for possible design flaws of this type, For the experiment 
described here, we assume that the alveoiars, especially /s/ # are neutral with 
respect to rounding. Hence, we would expect that in sequences of the form 
/isi/, no EMG evidence of rounding would be observed, since the vowel /%/ is 
traditionally characterized as spr^id, and the consonant /s/ is not tradition- 
ally characterized with respect to lip rounding (Bronstem, I960)* Howefver, 
since traditional descriptions are often incomplete concerning i*ne~grained 
articulatory detail, it seeised worthwhile to aake an explicit check of Up 
activity during the sequence /isi/ for each speaker. 

As in the previous study, we r*ave used an electromyographic indicator of 
rounding, the activity of the orbicularis oris muscle. The relationship 
between orbicularis oris activity and vowel rounding is well documented by 4 
number of studies {Harris, Lysaught, & Schvey, 1965; Froeskin, 1966; Tathaa 4 
Norton, 1968; Sussaian * fcest&ury, 1981), 



METHODS 

Speech Materials 

The experimental speecn materials were twa-word phrases spoken within the 
~ 3rn * r phrass "It's a again. n The Hrst word *a* one fros the set 

*iee> lease, leased, loo, loose, i#osed«* while the second word was one fross 
the set "tool* stool, teal, steel,* Ail utterances whose second word was 
either ''tool'* or *stool' will oe called the "anticipatory* set in the 
discussion below, since they were designed to examine anticipatory Up 
rounding, Conversely, those utterances whose first word was "luo "loose," 
or "loosed" and whose second word was "teal" or "steel" vHl be called the 
"carryover" set. 

In addi ton to these eighteen ^*per ssentai utterances i 1<? in the anticipa- 
tory anj su in tne carryover sets) , we examined an additional group that 
included "lee te?I" and "lee seal .* to determine whether a ^^r produced 
either or both of the alveolar consonants /t/ or /s/ with orbicular;,* oris EHG 
activity, tn the absence of a rounded vDwei , 

Tne experimental utterances were placed ;n randoaiied lists that included 
additional Ueass intended as follj* Five subjects r^ad tne randomized lists 
until u \v \% repetitions of eacn experimental utterance had been recorded, 
A siitn subject produced only ten repetitions of each utterance type. 
^uOjects were as**"* *a read the sentences fros an orthographic representation, 

tn^s . prodded the phonetic ?^guences natural to the *ord extinctions ^ 
<^.£>, !»i*i.#i] r*itr,*»r »nan [list * ui 1 fsr "leased t'ol 
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is indieatad *t 160 smc oafora /t/ ralaass. 
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EHE *m Audio oaf collection 

* 

£fiG potentials nare recorded f^oa sacral pta^iaenia on the superior and 
inferior orbicularis oris auseles for each sub jeot , Casing 3i*rfr*e electrodes 
si* liar to those dtecnoeo oy Ulan* Lubfcer, end Herrlscm 0??2** The 
electrodes **re applied to tee fcamlilioo border of the lips, and spaced aoout 
a half oafitiMttr; apart, The EHC signals vera recorded aiaui tenuously *ftth 
the a**dir and clonic signals on a *uitl~cnam>ei FN tape r«cordar % in later 
analyses* tbe c&a.rnel yielding tht EHC signal iritn the largaat amplitude *aa 
oho sen; in all sesea. this *se a super 1 or Up pieca»e*tt, Signal a trxm the 
lower Up piecaaent* did sot appaar to ^ qualitatively different* Out nad a 
louer aignaUto^noiee ratio, V 

agoustio eeas^asenta , The acoustic recording* froE eaon of the three 
subjects uQose data vara su&jected to detailed analytia mar a digitised and 
inalyted using an oscillographic display of tfee digitized **veforii r For aacn 
of the 18 ti»~*ord test i*tterancas, tne durations of the /IV/ ami 
sequences nere Matured for each of tne tan* to eighteen repetitions, as mr* 
tse durationg of /s/ frioVlon and fU Gioaure and aspiration, Averag* 
Mirations of tha /IV/ , ^Vi/, and conao«afft sequences «tr# calculated fro« tne 
individual t«*n aeeeyreaentf 

Reference points iiere ch&aen for aligning tokens of eac& utterance type 

tna i£ a*Bd#rs tne anticipatory sat *ss the release of the /t/ before /a/; 
for the carryover set* it its tha leae^nt of /i/ alosure cr tha beginning of 
/«/ frtetioft uaaediateiy after /u/ CFigure ieK 

PC » aasure»ants , the f>tO «^efor*a for eactt electrode j»sit£0* Ccnan- 
nail m*$ utterance repetition *ere rectified* integrated *sec, nardi^ire 
integration) * end digitised, The signals ,*ere smoothed v using a 35~*sec 
triangular «rindow v a*d tne erasable average «as oai Quieted for eaon utterance 
ana^nannei fron tne integrated EHC i***efor*sJ after aligning all tokens at 
tne £eferer?*?e point in tne acoustic utvefora. These signal recording nm 
processing tecrnnques neve oaen described la detail elsetfsere (ife^ley-Port . 

T 

* using tne ensea^le averages* «e iSeiersined tm oeginning of oroicuUris 
oris activity for the utterances in the anticipatory get, and the end of tnt$ 
activity for tne ytt*ra^cea m tne carryover sat, for th^ anticipatory set 
-iterances, the t>eginning of activity «es defined 9% tne ttse at nhicn 
orbicularis orm s#c activity reached 51 of its mtzimm aapUtude^ An exaspl^ 
of an enaettbltt- average of or*e utterance^ trm the data of suOjetft FBB, is 
snow* ift figure ib* with this onset tl«# indioated, F r the -srrycver set, 
tr.e 0f activity, vis defined as the tise at uhicn o*-b|cuiari^ ^is D*fG 
act; -lty feii to ^1 of llii «at 19U9 amplitude 

SESULT5 

*^y^£atory Coa^tlculatlon 

It the ^eglftfting Of vo««ei rooa^lng activity n?re U^SceO to thA r.egl^lng 
of the ^recetllftg consonant st'ifig, tn*n # ^egardie^a £f tne nim^r of e^n3K>- 
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Figure 2. Scatter plots of consonant string duration vs. EMC onset tlae in 
■sec for anticipatory set utterances, for ail three subjects, /l» 
* u/ utterance data are presented in the left-hand column; /u-u/ 
utterance data are presented in the right-hand col urn. 
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A. Anticipatory Coartioulatlon Slope of oest-flt U»e for consonant string 
duration vs. EHC Onset Tlae for fi-uf and /u*u/ utterances," 



fBB |SH CEG 

n-u/ a? -.3209 a» .10*9 .0006 

•F, 3S F t ,45 .20 Fi, a * .00001* 



*« .*927 ■ * .! y53 a« .2899 



•pc.05, but slops la negative 

••p<.05. If /otu/ eass not included, as.i5*s« F t,2s.2? (p>«09). 



i. Carryover Qoartioulat 'on ; Slope of best-fit line for consonant string 
duration vs. EXG Offst. Tiae for /u-i/and /u-u/ utterances. 
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/o-u/ as -.0566 a* .t>162 ' a* -.38*3 

F 1,3* .1089 F 1>tt * .2152 ••Fi,*» 23.92 

*fX.05 

•*P<.01, out slope is negatlv* 
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nanta In the string, this activity should begin earlier when the consonant 
string is of longer duration* if. on the other hand* the beginning of the 
orbicularis oris activity H* P e linked to the presence o* a rounded vowel, 
thare should be no correlation between the timing of the beginning of £MG 
activity and the duration of friction and closure, Since there is a general 
tendency for these events to be of shorter deration in clusters* it is 
necessary to examine a number of different consonant sequences, of different 
lengths, in order to distinguish between the con sonant- linked and vowel-linked 
onset hypotheses , In the present set, the acoustic durations of the medial 
sequences ranged fro* ?0 msec to about 420 usee. 

The * onset Use* of orbicularis oris EHC activity relative to consonant- 
string duration is shown, for the utterances of the /1-u/ anticipatory set, in 
tae left-hand col van of Figure 2. Each panel shows the "data for one of the 
three subjects; each point represents the average consonant-string duration 
and EHG onset tiff; for about m tokens of each type for two subjects, and 10 
tokens of each type for the third. If anticipatory coarticulation *sre 
systematically related to the onset of the cc^nant string, we would expect 
the points to be fitted by a line having a positive slope; instead, however, 
the point* are fitted by a line whose slope is not significantly different 
fron zero in two cases; and is significantly negative in the third (Table IK 

In the right-hand part of Figure 2, we have plotted the EMG onset t*~ 
relative to consonant string duration for the , o-u/ utterances. The res^s 
fit the saae general description as the /i-u/ case: that is, coarticulation 
began a constant interval before the onset of the second vowel , wltn a single 
exception for each of the three speakers — the case having the nortest 
consonant duration, A fairly straightforward explanation can be pro^'ed, if 
we assume that for this case the intervocalic interval may be shorter .an the 
ttae necessary for acscle activity to fall *o baseline for the fi-f /w and 
rise for the second* This hypothesis is Supported by the fact .hat. for all 
three subjects, the ainiaua. or baseline, activity for /*/ strings is higner 
than for any other (table 2), 
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Another Interesting result for the two vowel conditions is that there Is 
a difference In the intercept of the best straight-line fit for /i-u/ and /u- 
.W cases; that is ? rounding for the second vowel begins earlier if the first 
vowel was /i/ than if it was /u/„ Somewhat similar data are presented by 
Sussman and Westbtry (1981), for /i-u/ sequences as contrasted with /a~u/ 
sequences, in their data, the difference in onset time is not significant for 
the /ikstu/ vs, /akstu/ comparison, although the difference in onset time is 
significant for the /iku/ vs, /aku/ comparison. If the differences in onset 
tine are a consequence of the lip position for the first vowel* we night 
expect consistent amplitude? differences for the second vowel, depending on the 
ident: s " r the first. Such differences were reported by Sussman and Mestbury 
for tie /kst, -ases (see their Figure 3>* They do not comment on the /k/ 
case, where one mignc expect larger effects. Peak EMG amplitudes for our own 
data are rented in Table 3, and, although there is some tendency for peak 
values tc .*e second vowel to covary with the identity of the first, there is 
no absolutely consistent result. 

The analysis presented in Figure 2 does not examine possible effects of 
the location of word boundaries. Indeed, in the classic experiment of 
Daniloff and Holl (19635, no effects of , word boundaries were observed, 
although some similar experiments have claimed to show effects of some kinls 
of linguistic boundaries (e.g., HcClean, 19735. Since there are complex but 
S/j»tt*atic effects of word boundaries on consonant duration (Lehiste, i960)* 
we re-examined the data for possible word-boundary effects, as shown in Figure 
3. It was not possible to examine those utterances produced with a segment 
common to tne end of the first word and the beginning of the second, because 
^onsona/t duration could not be apportioned to one or another side of the word 
boundary. For example, as noted above, the sequence that was orthographicaliy 
represented as "leased tool" was usually executed as [listulj; since ft/ was 
associated with both words, no separation coul^ be made. For the subset of 
tne utterances where an acoustic event could associated with the word 
boundary, the results are as before—that is, there is no systematic relation- 
ship between onset of anticipatory coarticulatic. and word bou iary (Figure 
3)* ye would add tnat, for each utte*" nee set for each subject, the range of 
EMG onset times for th* orbicularis oris is considerably smaller than the 
range of consonant durations (Table k t part A). If the onset of EMG activity 
were ilnkei to the beginning of the measured durations, we would expect the 
ranges to be comparable. 

Carryover Coa rtlculation 

Examining the timing reiationsnip between the end of orbie daris oris D4G 
activity and the duration of the c ^sonant string following a rounded vowel, 
we found a pattern very much like that found for the anticipatory condition. 
Specifically, nne "offset time" appears to b* unaffected by the duration of 
the following consor^nt string (Figure 4). Bather, the slope of the line of 
Dest fit for eacn utterance sel for each subject was not significantly 
lifferent from zero (Table 1b)* And, again as with the anticipatory coarticu- 
latior data, tne range of EMG offset times is smaller than the range of 
coi sonant durations (Tafcle 4, part B; • In tnese data, however, lip position 
for tne following vowel did not influence the timing of tne end of the vowel 
gesture. That is, the following vo^i is not anticipated in the timing of the 
end of tne first vowel gesture. 
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Table 3 

, 

Peak EMC Amplitude (In Microvolts) for Vowels of Second Syllable' of 
'Anticipatory* Set Utterances, with /u-u/ Utterance Peak Amplitude at the 
Left and /i-u/ Utterance Peak Amplitude at the Right 



peak amplitude peak amplitude 
/u-u/ /i-u/ 
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Figure 3. Scatter plots of the duration of word-initial consonant strings 
vs, EMG onset time in msec, for anticipatory set utterances, for 
all three subjects, /i-u/ utterance data are presented in the 
left-hand column; /u-u/ utterance data are presented in the right- 
v hand column. 
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Table 4 

Range, in Msec, of EMG Onset and Offset Times and 
Consonant String Durations 



A* Anticipatory Coarticulation 



EMG Onset 



Consonant 
Duration 



Syllable Initial 
Consonant 
Duration 



FBB 




55 


174 


113 




uC r>u 


95 


. 172 


119 


NSM 




125 


299 


176 






70 


296 


220 


CEG 




95 


281 


174 






120 


298 


166 



B % Carryover Coarticulation 
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Consonant 
Duration 



Syllable Final 
Consonant 
Duration 
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Figure 4. Scatter plots of consonant string duration vs. EMG offset time in 
msec, for carryover set utterances, for all three subjects, /u-i/ 
utterance data are presented in the left-hand column; /u-u/ utter- 
ance data are presented in the right-hand column. 
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DISCUSSION 



The data suggest that the beginning of EMG activity associated with lip- 
rounding gestures for vowels is more obviously related to other components of 
the vowel articulation than to aspects of the consonant string length. 
Similarly, the end of EMG activity associated with lip-rounding gestures is 
most straightforwardly described with relation to the end of the vowel, and 
not with relation to the follc.rf.ng consonant string. 

c 

Previously published reports, suggesting that lip-rounding gestures mi- 
grate ahead to the beginning of a preceding consonant string, may be accounted 
for by referring to the timing of orbicularis oris activity, for the second 
vowel in /u-u/ utterances having short-duration consonant strings. In these 
cases, lip-rounding activity seems to begin later (i.e., closer to the second 
vowel) than it does in utterances having longer consonant sequences. If one 
examines only a few utterance types with one or two at. >rt and one long 
consonant sequence (cf. Sussman & Vfestbury, 1981), and if an earlier vowel 
gesture either inhibits or masks the beginning of the rounding gesture in the 
short-string utterances, it may appear as though lip-rounding onset follows 
the beginning of the preceding consonant string. However, we believe that our 
data cannot be accounted for in this way, nor can the movement study of 
Engstrand (1980), which give the same general picture. 

This picture of coarticulation is quite different from the look-ahead 
sc**ni;er model, presented by Sussman and Westbury (*981). In their model, if a 
prior vowel is biomechanically antagonistic to rounding, "temporal and ampli- 
tude adjustments are incorporated into the anticipatory rounding gesture." 
Rounding begins, presumably, some time after the end of the antagonistic 
vowel, but this time is simply displaced, by some amount, from the beginning 
of the intervocalic string. Thus, there is always a carryover effect of the 
preceding vowel on the onset of rounding; but for all consonant strings longer 
than some value, the onset of rounding varies with string duration, presumably 
as a reflection of the nunber of elements in the string. In the model 
proposed her a, a preceding vowel may have some antagonistic effect on the 
onset of rounding, and hence, rounding may appear closer to the second vowel 
in cases where the consonant string is short, or when the vowel changes. 
However, rounding onset time does not covary with the * nunber of consonant 
string elements beyond that point. We assume that the reason Sussman and 
Westbury apparently observed a string-element effect is that they compared a 
one-consonant sequence with a three-consonant sequence. 

There is still a good deal that remains unclear about both models and 
data. We agree that the onset of rounding is clearly influenced by peripheral 
biomechanical concerns; thus, in the Sussman and Westbury data, rounding for 
/u/ begins at a different time following /i/ and /a/, and, in our data, at a 
different time for /u/ following /u/ and /i/. However, by examining a set of 
utterances wt.use consonant durations for each subject were fairljr well 
distributed through a wide range of durations, we believe we have shown the 
rounding gesture to be linked to the vowel articulation . That is, the 
specif ication of Up position for the consonants is not altered by a migrating 
vowel feature. Instead, and as we have also suggested elsewhere ( Bell-Bert 1 £ 
Harris, 1981), we see the vowel-rounding gesture beginning at a relatively 
fixed time before the acoustic onset of the vowel and simply co-occurring with 
some portion of the preceding lingual consonant articulations. 
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FOOTMOTES 

'optiam choice of Uaing measures from EMG signals depends on several 
considerations, including both the nature of the EMC data themselves and the 
use for which the measurements are intended. There are three sources of 
token-to-tbken variability in EMC signals whose relative magnitudes bear on 
the choice: uncorrected electrical noise, the statistical nature of motor- 
unit excitation, and articulator y timing variation* Effects of this third 
source are minimized by control of speaking rate and by Judicious choice (and 
careful measurement) of the acoustic reference point, When the first two 
sources of variability are large—and especially when the EMG onsets are 
gradual~-aeasurement from the average Signal is preferred* Since we faequent- 
ly encounter both gradual onsets and relatively noisy signals* use of the 
ensemble average in determining EHG onset time is generally the method of 
choice (feer, Bell-Berti, 4 Tulier, 1979 K 

^This value was chosen because it assured that we were not identifying 
random background noise as the beginnihg of activity, This 5S point *as 
exceeded for each speaker for the utterance "loo tool," which had a relatively 
short "consonant string* snd, consequently, "he minibus *ev?i of E5« activity 
between the two rounded vov^Is did ,ot fall t^ of the peak activity, For 
these case,*, w* chose tm tim« at which - „nimus» activity occurred. 



TEKPOJttL €0*ST&AIHT5 ON ANTICIPATDfty COABTlCUUTIOll* 
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A&strdct. Tns> accounts of co«f ticulatto??, u that the a^tlei potion 
of segsental gestures. am« thus the e*t*nt ot their influence # 1$ 
determined primarily according to the compatibility of the feature 
specification* for nreceding and anticipated phones, an# tnat the 
eitent of anticipatory gestures is delimited according to tsaporai 
specification* intrinsic to the aotor progr»» yield very different 
predictions regarding articulator? organisation. These predictions 
*ere tested t>y varying tne nutter of intervocalic consonants in a 
viCnV2 f *here H2 was either nj or /u/ and cn *»s /a/ # /st/» or 
/stlst/* tfe *ere thus able to determine the **tent of spectral 
changes witiun the consonant string as, s function of the upcoemg 
*o^l , B Our results lend support to the second account and suggest 
that the onset of a phone's influence on preceding segments is 
t^apor#iiy constrained, presu&afciy because anticipatory gestures are 
Uae- locked to the segments they cu*r a eterut* are not freeiy- 
migrating features, 

A significant issue in speech production theory 1$ the ettent to ntner- 
articulator*/ gestures for speech segments are anticipated, Froe tne long-ago 
realisation, that phones were, at least spectrographitally* non^isorete, theo- 
ries of feature spreading wer*> &orn in sttespts to reconcile a continuous 
output with a presaged dBhjfiunuous input \e,g* 4 Danilaff £ ttmar^erg^ vi?l; 
Menice* !%? *. 

numerous &od£ls of coartieuiati on nave incorporated the notion that the 
anticipation of articulatory %z$lur%9 occurs primarily awarding to the 
compatibility of tne feature specifications for preceding and anticipated 
phones CSenguer^i % €o«an, 197$* Daniloff 4 Noli, 19682 Henice* Me Clean, 

1973: Suaaaan * nestoury, \%\). Coarticuiation, according to this view, is 
therefore limitless with regard to use *nej spreads over entire penological 
*its until it IS Slacked by incompatible gestures, J*i anticipation of a 
r5unded vowel , for eia&ple, lip rounding is said to occur o*er 45 stany 
;revious ^egatents as are unspecified for Up configuration,, with the eitert of 
the anticipatory gesture varying directly with the onset of th* prevocalic 
string . see, for *««ple, genguer*^ % Cftwan, lanu*ff 4 mil . , 
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•U«™au*e to 3«g«iwt^i5^ models of co#r tlculatlon is one that 
POMIB ?:*at anticipatory gestures ire lis*- Jockey to th* segments they 
character i (BtU«torti A !980; telUBarti & Hirria, 1979. 1981; fowidr. 
*980K S^h a $odel Mould also predict the coproduetioa of a larger nuBb«r of 
segments as a consonant string increases, ?*ot Oecause art anticipatory gesture 
attaches Itself to all preceding pftones, but as a result of segmental 
ahort*f*ir;g, thus, an increasing n»atf>*r of segments fail vlttiin the relatively 
fUed ti*e cours* of the articulator^ gesture. The ettent of anticipation is 
therefore temporally deluutcd, *itnayt regard to the absolute ^u«>er of 
preceding segments, 

A temporal ^odel further predicts that the $*$iutude of an up-c»4^g 
phones* s influence should vary as a function of tesporai proa&ity to that 
phone, ^erefore, the longer the preceding string, the less Ufcely it is to 
sno* coarti^ulatory effec* at its onset, *tiSe * shorter string should sho* 
effects o#er proper tionateiy wore of its length. Hoover » at the saee point 
in tl*?se relative us the acoustic onset of the up*cosin£ pftone* tne degree of 
the ^peoeing voxel's influence should fee sissiiar fo* Both long ana short 
Strings 

Mterf jt've.y. if, as se*se«t-&a5£d 50de*s posit, tr.e onset e f coartico- 
iauon occur .3 iimult&xmmi? mitn tne ons*t •■- f a preceding strings **>ui<s 
^tpe^t afttici ^ *tory effects t-3 ge latere ; ty se«t«*n* v * 3 pro*|£Uy to tr«? 
onset 5f the string an** to £*a* little cr r-3 relator, to Us tavporal dist**»c* 
fro* th* sec;*r3- 

?ro-C eti e 

Ttz* pr *Urti^a :?f either aodei sr. ■* t««* *e r ;fia&le t>,- varying trie 
nufcOer of ; 1 *en ;n inter vocal it consonant strings ana noting the pattern of 
-*eco?r4 for^aftt frequency cringes as 3 function of distance fro© the second 
+0*ei «#e therefore constrgct^l r#£i-*or3 attergnces *ith VCV 1 * eMMdetJ . 
*fiere C •* .v , /**/', or /atfst/ and V, d nd Vs *er* either / 1/ and /u/ 

alternately ^r &otr 1/ or /-i/. Jr. ai; there uefe twelve -utterance types ar 
sit ^i?i|s#I pairs, differing only as the- Hemry ftf V , .. sale speakers of 

setrojsoli'ari urk sfea = t read ?1 ^epeti? lot- 3 of ffach a*l^rance a 

t^tJeM^d t rs the carr«^ *Hot tc^J^y** i*jt l^sse eas^ 

today 4 **: ^r^^tion^j *?re rec^rd^ on *a^netic tape* inpui t-7 a ^UneyveU 
DDP-?*** I^trf U*f Co«p^t»r t^or^t^r i es , anj aiglti?jf^ TO<^' 

*avefoTOS ^?»f if .isSg^etS v »■> 41 set * :« f the second vtwe; jf*i stH»t - truK-an-»l y;ei 
Il^e- jp points JvnfifTB« ^p^ctrograprsicdi * y ROtJog * s^ldr?^ ir.^feas* 1 

• ? - ^* 4^sp; c , f . >ie tie«t:crt: ^cifal flections- 

..^nsonant 51^:^4, therefore wi^h;r : irictl-^^. ;n-1i»S'i »« spectra t-* " 

•it i t^f ?ir: ^e. wr*f *- " e^ *«wpf ag**3 , ;^4#*r * ,:■ Ircrease tf - reli(*M«i>> 

^ea^u^^en* ^ #« ^f? *f^* t^le l. t^^r^e a*a*yr ? ; fretjuers/y 

^^S^naftces *fitM**i f r * ir»f; . n*vJ ^ %?| 1 4 *- i **r *o * h#» a* tJeco**^ f >ngars* * 

ij*<?ause *hf*y 4pp^*4r 1 ce \-.f»tir.^: us w;tr, tr.t ^r^nj f>>rtraftts of * .rs** f^artklng, 
v-^uel 5. • ; * >5 fc * ji.n-j Y#»fs * .icomsh 1 aft i ial I *^7^ • n*t** 4LS^> ^ns*^tef*tiy 
5^r. a^M f* , ^-i^" :y * v/V,;ttv:«. »v * ; t * * r f ^ * ct ior. f ?r ^ t r.jj ^ #* t int % S 1 
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figure ' shows two averaged spectra witn their respective peaKS displaced 
above for trie sjnisal pair *lease ease* and "lease oo^e* for one subject, 
Note the low f regency resonance through the Int ,-rvocdiiC portion of these 
utterances, 

figure ✓ snows the averaged waveforms for the utterances "lease ease,* 
*&east ease* and *Iea5t steel* for a second subject. These are 200 msec 
3§£p;e3 that delude the f*r-5t >v sset of the second vowel and the *50 msec 
pre^diftg 11 Thys, at eve-y temporal point relative to the onset of ?£ # we 
:jr sssp*;-g Afferent ^*"ti of the acoustic signal for ea^h utterance 
type. 

;* snevt 3 a; so t*e rcted that, for me /st#st/ utterar.ee, despite trie 
>f tr.ograpn/, t*-.#re is evidence of only one friction per* ion, one closure 
period and ar.r release Thus, t^s utterance appears to have **en produced 
"natural * tr=at is, as Ut differing fro& the /si/ utterance only in the 
Jurat irr the ensure, 

*M;e spectra* averaging solves some problems, however, £i also presents 
-tr;eri Thus, because individual tokens of a given utterance type are 
P'-^dwce* variable derations, r * UKeiy that the friction, and vocalic 

portions v:;i £e averaged together as the distance from tr*e second vowel 
mcr^^i. finer to ami&ue the ^ssibliity of confounding the data in 
tens M ay w took the r^ng* for all tokens of each consonant string type* 
jeteraif.ri aidpcint, and apf * *-d t^ptens into lon^ and short bins on this 
s«i»l* . 

> t fiw.r^at:. were ssade f-v-e spectral sections at msec intervals 

j?.d coiitipsed over £**ec 'interval fa** tne 150 msec preceding the acoustic 
iftsel -t the seco? 1 vowel. For each minimal pair, values for the 

*tl*rac^*3 with fifia* /y were aufctra--* : .>oa those of utterances with final 
* i * J sine* initial v^wei? were always identical, positive values are there- 
f:*re ;r,*i rat; w f the final vowel's influence, with larger differences 
f I t i * r e ^t #*r **u ; r 5 t y effects, 

J* eat* * t * 

f;g,r#- -mow* differ****** ai jng tne y-axis for for aii long 

ar.j shirt T.n*«a; pairs where V 1 i 3 after sorting, It should be noted 

'^t, wh**r tokens are sorted in tnis way* there is temporal overlap between 
*»terar.-e types. For » f wple, the longest singleton string is longer than the 
shortest string, wMle tne longest /st/ strings are comparable in 

Juration t* the shortest /stlst/ strings, which, it should be recalled, were 
pronojftc*a { . t : ; , Thus, these figures actually depict two— «and sometimes 
tnree~-co©par;sons: ane for consonant strings of different phonetic structure 
ana deration* cne fcr consonant strings of identical phe r etic structure but 
different 3v#ratlo*;3 % and, ir some cases, one where phonetic structure differs 
t^t dura* ** >ns .*re ^pjrab.e. 

lifsa! the -Jat,* show is that, despite temporal and phonetic differences or 
aiaila^t:**, the critical variable appears to be time from the onset of the 
secjr.-j vowel, s^ch that there is a similar decrease in the Fp difference for 
each pai^ as their distance from V ? increases. In other words, it appears 
that, i jr ..iterances ^f this type*" the influence of the second vowel is 
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LEASE EASE LEASE OOZE 

"NOT TODAY" 
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Figure 1. Two averaged spectra with their respective peaks displayed above 
for the minimal pair "lease ease" and "lease ooze" for one subject. 
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Figure 2. Averaged waveforms for the utterances "lease ease," "beast ease" 
and "least steel" for one subject. Accompanying labels depict only 
the intei. >d expression* and are not transcriptions of the 
subjects' actual productions. 
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Figure difference in Hz for sorted tokens of ainiiaei pairs where ¥! is 

i/ . Long and ahor tokens are indicated &y closed and open 
symbols, respectively, the different symbols denote consonant 
string type, with triangles for singleton intervocalic strings, 
squares for /at/ atrings/and circles for /st#st/ strings. Values 
on the *«a*is indicate time before tne onset of V2» uhich is 
indicated by 0* >«tporai points %rtiere symbols are absent corres- 
pond to the closure period of the stop consonant* The a sec values 
next to the symbols in the legend indicate average consonant string 
durations for each minimal pair, 
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temporally de*i 'ted irrespective the segmental composition of the preced- 
ing string . 

Figure 4 depicts tne same F 2 difference as a function of time from the 
onset of tne seccr.d vo« if out for pairs whe^e V 1 , s /u/t Again, the long and 
snort tokens of mimimal pairs are plotted, and tnere is the same temporal 
overlap for tokens of different phonetic structure. Pernaps even sore tnan 
tne first figure, tne data illustrate tr,i tendency <or ali utterance types to 
snow similar anticipatory effects at almost all sampled intervals, 

Note, too. tnat at -150 msec ye are sampling tne F 2 difference at tne end 
of the first *owe; for the shortest /s/ tokens* It is interesting tnat the 
magnitude of tnis difference is almost identical with that of the friction 
portion of tne other pairs* This finding might be explained, not Dy 
anticipatory lip configurations as far oack ar the first vowel, which for /i/ 
and /w are incompatible, Out by tongue configurations that are capable of 
anticipating up-coming phones without preventing the successful production of 
z\jrrtnt ones. Thus, the job of coproduction may oe divided oetweer. primary 
articulators. 

figure i snows tne data for our secon6 subject* s mini" -1 pairs where 
;a '":/. irfhiie the trend is similar m tne sense that anticipatory effects are 
similar in magnitude at most intervals, tne effects diminish mere abruptly 
over time and at intervals closer to V^. 

A possible explanation is ;he fact tnat, witn only the exception of the 
*'st#st*' pairs* all V 1 offsets occur within tnis 15c msec window* This is 
uniiice oar first subject, *fcose consonant strings were of longer durations 
and. witr. one exception, fell outside this time fraaie. Thus, while it may be 
possible for these vowels to coarticuiats , and tnerefore show anticipatory 
effects, there may be limits to these effects for vowels as opposed to 
friction, tnus possibly accounting for the rapid fall-off ir F^ differences. 

It ;s interesting, too, tha* there are some negative values, indicating a 
Higher F- wnen / u/ ratner tnan /%/ is tne second vowel. However, almost all 
of these occur at !$Q msec prvor to tne acoustic onset of V^, the most remote 
portion of our sample. And, while we nave not tested these differences 
statistically, we^ would speculate tnat most of these values do not deviate 
significantly from zero. The valine for the long fsttst/ pair, however, is at 
appro* idtateiy minus iQG Hz, wnicn is substantial, If not significant. And, 
since t^*re is no otner instance of s^cr * negative value, it is posslole that 
tnis reflects car^y-over effects, 

f*&+re f) snows tn^ data for tne second subject's pa\rs where V. xs /u/ t 
and it . s similar to ms other utterances in tnat there is an abrupt fall-off 
in magnitude of tne p£ difference at ~?5 msec. The general trend is, however, 
simiier, aitnough there is more scatter at tne intervals fartnest from 
whicr. we cannot explain. This differs not only from our other spuaxer , D^t 
a; »o om tnis speaker's other utterances . 

Discussion 

The data ootn subjects snow tne tendency for coarticulator y effects 

to he max»aai at points in time closest to the acoustic onset of tne second 
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vowel, independent of absolute duration and segmental composition of the 
preceding consonant string, And, while we do not observe the influence of 
to be identical in magnitude at all points in time, the effects are systematic 
enough to support the notion that coarticulation is temporally constrained. 

The data thus speak against the notion that anticipatory gestures 
automatically extend back to the onset of a preceding string* It was observed 
that the early portions of the longer strings ♦ failed to show substantial 
effects of the second vowei even though they were allegedly free to do so in 
the sense that anticipation of V 2 was in no way incompatible with their 
successful production, Furthermore, some of these Fp differences were actual- 
ly reversed, indicating, perhaps, that carry-over effects were still operative 
during the early portion of these sc. ings. In addition, coarticuiatory 
effects for the shortest consonant strings were sometimes observable during 
the latter portion of the first vowel. Thus, we see both the absence of 
coarticuiatory effects in places where segment-based mouels predict their 
occurrence, as well as the presence of effects where these models, by virtue 
of the hypothesized mechanisms, predict their absence, 

Our acoustic data are consistent with those of Soli (1981). who found the 
frequency of F2 within friction to be lower in anticipation of /u/ vs. /i/, 
However, ne attributes this difference, not to lip rounding, but to different 
place of the primary constriction in anticipation of back vs, front vowels* 
His argunent appears to derive primarily from data showing F2 frequencies to 
be similar preceding /a/ and /u/, where both are back vowels but only one is 
rounded. According tc Soli, the effect of rounding, then, is to alter the 
fricative's overall spectral shape above 3 kHz. He maintains further that 
"while anticipatory vowel coarticulation appears to be limited to the final 
portion of the fricative, " anticipatory lip rounding may occur throughout the 
fricative ? p, 21 ; . 

While we consider Soli's general hypothesis regarding the acoustic 
effects of anticipatory tongue configurations to be a very tenable one, we 
would reject the notion that the general time course of anticipatory gestures 
differs significantly for different articulators/ In other WGrds, the fact 
that the lips are free to round during the course of a fricative preceding /u/ 
does not mean that they do so. This was demonstrated electromyographically by 
Bell-Berti &nd Harris ( 1 9? 9 , in press) and cineradiographically by Engstrand 
'1981/, whose !ata show lip rounding to occur at a fixed time before the 
acoustic onset of a rounded vowel and to be unaffected by the nuaber of 
preceding consonant segments, the p. oduction of wnich in no way precluded lip 
rounding . in addition, Bell -Bert i and Harris i in press) demonstrated that 
certain speakers round for /s/ in totally unrounded environments (^.g*» 
ffisi/}. Thus, one would naturally expect, the electromyographic and aeou3tic 
record s to 'Jiff*»r depending on whether rounding is or is not an inherent 
f^at'jr** of a speaker's friea*,ve production. 

The m»nn point here is chat while it may be that lip rounding and place 
j\ constriction eae^t different spectra] influences, it is intJltiveiy unrea- 
sonable *n weii empirical] y unfounded to suppose that the general organiza- 
tion of Mtl"i^4tory gestures should be articulator-speclf ic , 

Th* revjitt of the present study suggest that the on.net of a vowel * 3 
if,fi*en'-*» %f f pr*f ^iing segments in temporally constrain*!, pre sum ah 1 y because 



anticipatory gestures are time-locked to the segments they characterise as 
opposed to being freely-migrating features. Further interpretation of the 
data, however, is limited by the fact that only the acoustic . waveform was 
analyzed. Ve are currently planning studies with simultaneous EMG recordings 
from orbicularis oris and. pertinent intrinsic and extrinsic tongue musculature 
in order to determine whether we can account for our acoustic data and Soli's 
on the basis of tongue and/or lip configurations. In addition, using subjects 
who produce /s/ with and without rounded lips in nonrounded envirnonments 
should provide an interesting comparison. 
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FOOTNOTES 



f It should be noted that while we and others { Yeni-Komshian & Soli, 1979; 
Soli, 1981 ) consistently note low frequency resonances within friction, 
previous accounts of the acoustic theory of fricative production (e.g., Heinz 
& Stevens, 1961) all but dlsniss the presence of low frequency resonances, due 
either to the decoupling of the front and back cavities or to th# cancellation 
of back cavity resonances by the presence of zeroes. 
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Abstract. Nany teatSK-ofcS 5 i 4) ?e **:**t, ;r ^ J? r.:r^ of t»v 

Si-Jp consonant* ^ English, th* first slot Is :utwnly -jnr? :«**e-j 
ror nonncworgarfiu stop consonant sequences » this stat^ent ty *r 
tnk*»n to ispiy that the vneceasary) articu*..* '**y release <?f ts-e 
first stop has no observable acoustic consequences, To esaain? t»aSs 
<-iai$, we recorded sentences, produced &y several native speakers t; f 
American English st a conversational rate, containing v-*d**lntef»nal 
sequences of tvu nonhoasorganic stops, either I across a syllabi** 
boundary (e.g. £ggtus, pigpen ), of in *yrd«fl}*al position (e.g., 
22t5*5 , » ^scUlograais of the cr-tjcal vords revealed that 
release bursts of the first stop occurred in the majority of tokens, 
etcept in those wtnere the second stop w#s bilabial. The bursts *ere 
acoustically rstner Meafc and difficult to detect fcy ear, arfsich eay 
avcount for their ^sving been neglected in *he literature. Instead 
:-f a Maple *V?lrd3ed*~*gnreleasea~ discretion, ve projosr u five- 
nay classification that sakes use -f articlatcry , acc-jst i - . 
t e^^i-f t^al < and *unt ras? i phonetic ,-ritrfij, 



1 1 t'^.i.u, 5f]ue nc e 3 o f ? nonhoesor g a n i c 3 * % onaon«ft t * a r ** r o t 
■ificcffecr.. T^ey occur across «or.j boundaries e.g.* do£ # grgat gfgf*. 

across Sy^able boundaries within words ( *e.g*. cactus, £ijgfj*£'t ^rd |n vor3- 
•lr\htl position ,f .g. act. ££5M5 K Tesiboofcs of English phonetics g#hf*rali> 
point c-jt the first stop in 5uth sequences is eccaoonly ynf*i#*&sed or 

unvsplodM. ^o»e cWthors -"e.g.. Ludefog^a, 19**-. pp. MacKay. i^7s, 

;_ ifcn; say v> »nan t?:at . while others e,g . Ab*rcroG*bl#\ r 

3' v f * r K p. c\7 , ior-**%, ' p Kocy^r^ p„ ^' »cr»- 

**t pi I j ; t 3li"-t ! h* *jr • i cj * ator sr. j acr-^sl \ c pvp nt 1 I f:v?i 

flltncf! : ;r*_f*f 4wa. lfi-?Hti t*«? »»ateftent t*:t fir?! :p s 

twu-stup JicijPt--* ;a -jr.re^eas^d 0^ 3;<*a^1 ir.g . If *rv5ea5e* is ^:rferU>- 

lnt^rpr^tc*t a itrut;> art iCw.atp f^ 1 ters, f^frrrt^g t v the £ r «* ik _f 
-or. t ac t r^n » «p e r. * sir t i r u ; at or s t na f r e v; ; t !» in t if r »> ; <> r f - v #*r pr #* ^ u r 



-r^^^nteJ ^* ^V^^^st _r * n#- 

ntario, : n-. *ay **' , - ! - Trie ^3*rar * 
rind WS ,;r*n! ^^OS^^*) tc Hn^k* t Lift 
Le^gh ISKfr, zir4 Igna! Matting;-/ for 
*Also I^pdr^isr'** - f l inguist ic^. > ; vt?r^; 



i -** j #*• t •'ij .^j*' ; *^r?;- - ? i 



- -jr-^octc* g«*r t - Ht->;*. •**• . f I*,** ?t~?t 3* t<> f * (**f;r* 

t*at cf tr^ **;:onl st.-p. -tht^wi sp: jnl pr>3uc<f2 w\m 

if5c^rr#*ct p t o- p f *r» U'u£4tlof- Thrrefo?**. It appears %fih\ 

^etplasiors* to ri*fr-^ rtut the •* t icwi ator y r**;<*:*$f? t-ai itft jjcoy^t t - 

c of> sequgnc y - » ft #> f ?_r* 3p**c* ^igr.^i t^at. !>r feasors f 

t*f^l«olDgi-r^: :c,r:^;^*^nry rf »**pp, wtr prefer to cjj 1 ? he *fei<?35^ 

&ur$L-* If j,>. t^r, , f strict in»t»rpr^n» itn ;f th*» t- r~ * Jfir* imrj* njui-1 
i^p'.y th.jt, even **^;j**r cf t*il r.c-tt? i c«Q* F *ijf ir SVif *~r..ir:* 1 . ^t-Ic^*- 
t 1 t t- f * f - *r J ; *- ^ » 1 t oj> I *i f r* 3 ; r. t h ** ac<>i#9 1 1 r *• ^ * J 

,ttpr( ,p3 ¥*-^r#* :^ o t«?ot~p *vqu*r:?e. pro4-Jce^ th* 1 **** 3p**ifcer 

r#«feai»*] ir-^t tfe* ** r £ 4 ' majority of 1** to*<?n$ contain^ cl*&fiy Identifiable 
rci#M* t--*rs»* ;f ?r,r* fl^^t slap ;3£pp* **5J* £ Ir; a ^ore r*c*nl study asU^ 
3i9l;iir «ttefan^^ pro*?uc«r3 Sy two ^p^a^^r^ ^epp* Jr. pr-ssU d*I tokens (with 

Mngi* *t-rpti-.*n5 contain* J j^e* burs' . e*c**ov*f. ifce £yr3i$ vert? 3rjo*m 
?^.ivi* prrc^pi^^I ^i^nifje^nc^ itert? r^ll?f tuthor^ wroftg. or did t^ey p^rhdp^ 
^^f^r o^ly t- -;cn*ers*tlw»i 5pe«*^r^ , vf *^irn t*:«» utt^f^nce^ ^t^ined 

A — *r-? -^4i*r,g >f vvtip ^* t»tc 5.= ^ »-tt* biggest!* tr^» pnonr*:c;3? ? 
r;4 i^t*n4 Jrfiy coapletely *i^y 'ico-j^t;- ^srj f e^tftt lefts sf t^tr 

:*r tjr-jiatcry f^;#* t >^ r f th^ flf^t »tOp- ?^r ;r.5tafit«r* Atf^rcrsa^ic J -'96?,- 
Points out th.^t, ir, j:t/ 5*qu^ncf*5. *Th^re ea> t> ^ little fdl^il^^ 5ssc^' 
t?:^ %***%r3^*- a as A&rd^/^ Tucket pol^tfl r--^t £ *n " — j 5 but for grpytlcaj 

|:^ f £Jl 3 ? 5 15 ;i-c«pi*l»» ^^Sit^riiy. and ~oy bt* ?crf* f5pf»Ci f IC^I 1 > 

fff^^r^i t- ^.r-ipi^^j** ^ '-u' ^ep^wjais. *r-j J:-fj^? points 

. ^t tr.al . ;i ,tr. 5 - ; ^^jt-n^H, ^Th** p ¥^5 t " dO not 5J*j? 22£JSA 

»rit 3^ to n:i> no n ^ * 15 r^d^d »^*^ lips 3r«? 3^par,ntea* 

p, ^«ff ^^515- a»dt*«*ft?.s 5ygM« ?J ^t t*:^ 1 - tr*,i> rtalhor* w^fe ^^rc 

^1 t'^I^rs^f* t'/f *hf* first *t^p Si»y ocr^r, t--jl th.it 15 ^ubjitattt 1 al I >f 

r "tirar. i--f ;* i. 1*.;.^ :n Er.ft^l^r.. *n r j! -"-! > M*i;ir ^t*j; - IxM t?y -f 

* * . «^ * ?; 1* * #' ; ** f *^ t--«* 3t ^ * > .-r , r %w - f 1* » t ; > . r. i m 

■3 ^ _ h * r , *» 1 * ; *; * 1 ; *.i : 3 : f t . J ^ ^ * f efi^": e ^^p^* n t : n ! r.e * ar 1 1 c ^; 3f 

M^ jfl ^r ; ;;,r-^? _* >r» « r a* ic*; ¥ ^*" v H^ps^ kr * tr.«? &r ^ -^Ji* 

* 3 * 1 . * : i f i c = * 4 1e t e : » r > *-,»r Tr ^ pr * 3c r ! ^t i#3 y pt o v ; J » v ^ ^ ; ^f* s**f* r * 

t*-.jt --^"Ft * ■ t L4?.t ? ^p 4 7>-*»i ^f^t f j f r > 5* p t 

*^r»-.n^:* * * * t r****f 1 * ; • h _» j* # ^ 1 1 1 1 j * * ** , v/5* A - ,i cep^r * t ; * •* 
.r*> . *■ ^ . *^ ; n ^ kt ^ 7 iy in* i . o*** it I? f t «^ >t*rr*g i a 4 ? ? »* 

_nt***-' * * 1 i • r.rfp ct* f 1 *, It*;* 1 * * »• #. **- '- 



With thr^e vol "less **nd three voiced Stops if. hngUsh, tf,ere are 
possible sequences os two 3tvpS with different places of articulation. Of 
these, only four i/bd', /gd'. /pt/ 4 and /kt/> occur in word-final position, 
primarily in the past, tens* foris^ of verbs. AH 2* sequences are permissible 
in word-aedla* position across a syliab* : boundary, but only two C/pt/ and 
kt/> occur with any frequency, primarily in words of Romance origin, 
However, by including soae cospcusnd words, we were successful |n finding two 
eiasples of each of the sequences in %*ord-©edial position. 

we constructed sealing fui sentences, each eontairing two of the words to 
be measured , and the subjects read fre« a typed list of these sentences, The 
sentences are shown in AppendU t with the critical words underlined. As can 
be seen* all stop sequences were isssediately preceded and followed by a vowel, 
witt* prisary stress on the preceding vowel, (fcote that we were not concerned 
her v * with t«*stop sequ**ees across a word boundary* although two stops 
crossing a &orphe«e boundary in words such as Pool c asp ©ay be considered a 
rgtK^r similar instance.) 

3u native speakers of Ajesertcsn English, three ©ale and thr*e female, 
*ere selected as subjects. They were not informed Btout the purpose of the 
efperiaent* but were M*ed to first study the sentences and then read the© at 
a normal conversational speed. Their productions were recorded on usagretlc 
tape using a Sennneiser nK& sicropfcone, placed approximately 8 inches 

from ti.a Subject's Hps* and a Crown SX 822 tape recorder, The recordings 
*#ere then digitized at *0 kHz using the Masking Laboratories pulse code 
modulation systes, and the wavefonss were displayed on an oscilloscope. We 
zeroed in on the closure periods in the critical words to determine whether or 
not a release burst of tne first stop was present. If present, such bursts 
appeared as distinct spikes of a few mil Useconds duration, roughly in the 
tenter sf the closure period, A typical e*a©ple is shown in Figure la, %ttth 
the closure and the release tarsts for both st%ps indicated for the utterance 
scapeg oat . produced by a female speaker (CCk In sose cases, the release 
bursts were of very \o* amplitude, and two of the subjects product a few 
tokens containing suiuple or e*agge?ated bursts, hut the token shown in 
Figure *a i* repr esent^i l vp -f Vr.e sa*or;ty *f ut Frances containing release 
bursts, 

Results 

Th#* frequency if v_^:^rrenr** a: % r*;t»ase burst for the first stop a ,n 
w^mMia* Sffguenres is shown ir. Tab*** The colons represent the six 

possible sequences A different places i stop articulation, while the v# 

ro*s represent th* individual subjects, The voicing featu- e ^f th«* stops has 
seen ignored in this analysis, 3ft that the percentage in ea-h ceil is based on 
*ight aords. Looking at the s**ans in the* right margin, we see that, overall, 
Sfl percent of the words contained a release bur at of the first stop, with the 
average percentages for individual speakers ranging froo ^fe to ft? percent. It 
i .1 further evident th** sea*;* in the bottom row that release bursts were 

not e*4uai I / ; '• y f * : n sonar.! c *nMna* i < : r n . ^ai n determinant was 



SCAPEGOAT 



WITH C, RELEASE BURST 



a) 




C, release burst 



b) 



C 2 release burst 

,;. — j djlHi^iiimrr 



Closure 



WITHOUT C, RELEASE BURST 




C 2 release burst 

' iM^lhiliniii'fi 



Closure 



Figure 1. 



Oscillogram of the word, scapegoat produced by a female speaker. 
The word is shown excised from its sentence context with the 
release burst of the first 3top in place (above) and removed 
(below). 
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Table 1 

Percentage of Words with C, Release Bursts 



Place of Stop Articulation 
C K ALV VEL VEL LAB ALV LAB 
C 2: LAB LAB ALV ALV VEL VEL 



Mean 



Speakers 



NM 


25.0 


25.0 


50.0 


12.5 


75.0 


87.5 


*»5.8 


AB 


0.0 


0.0 


50.0 


87.5 


87.5 


87.5 


52. 1 


BR 


0.0 


0.0 


37.5 


87.5 


87.5 


100.0 


52. 1 


CG 


12.5 


12.5 


75.0 


75.0 


75.0 


87.5 


56.3 


JM 


0.0 


25.0 


87.5 


87.5 


/ 87.5 


75.0 


60.4 


RK 


12.5 


87.5 


100.0 


87.5 / 


100.0 


100.0 


81.3 


Mean 


8.3 


25.0 


66.7 . 


72.9 


85. « 


89.6 


58. C 



Table 2 

Percentage of Words with C 1 Heiea3e hrlts 

Place of Stop Articulation 
C 1: Labial Velar 



c 2: Alveolar Alveolar 



Mean 



Speakers 



NM 100.0 75,0 Bi.b 

AB 75.0 ?5.0 ?5*0 

BR 100.0 100.0 100.0 

CG 100.0 10U.0 

JM 100,0 25.0 62,5 

RK 50,0 75. C 62.5 

M *an . 87.5 75.0 8!.?5 



the place of articulation of the second stop, when the second stop *as 
labial, release bursts of the first stop tended to be absent Cetcept for one 
speaker's velar-iablal sequences); *t*n u alveolar, release bursts ue-e 
pr****t in the majority of utterances; and when it was vtiar, release bursts 
were even more coaaon. The place of articulation of the first stop seaaed to 
play only a ssinor role, an£ we also observed that the voicing feature no 
consistent influence on the occurrence of release bursts. * 

Table i shows the sas* analysis lor the i-ord-fin^; ^top s^uencea i 3 * e 
Sentences 1-* in the *ppenji*)> with the colons representing the only two 
possible sequences of place of articulation, ana the rows representing tn* 
same individual subjects, Again the voicing feature has been ignore so tha? 
the percentage in each tel* ;a based on four *©nss here, since no utms 
containing stop sequences differing in voicing Ce,g, # /DU* /*d/* occur in 
woro-final position m English, The seans in the ^ignt ear^in ane* *.h*H ; 
over 3 n, SI p«?rcer» of the «or^s contained a release burst of the first stop! 
with the average percentages for individual speakers ranging frc* 6 J to *0Q 
percent. The 5#ans in the bottom indicate that, as in *ofd-5&ed;«i 

position, »he ^lac** -f articulation rf tf.t* first *tep r.ad ^vVvM 
effect. 

4 

\ 

iR* Mtl^rft *n these -jat* rar I*? jnderstocp-5 ty considering the 
articulator/ maneuvers iftwolvrd- *n^n *«*£Oft3-Stop I? labial, the speaker 

h^s the option af closing the lips before an e**ll«r alveola* or velar closure 
is released, and if this option is felio^d* tft* release of the first stop 
occurs luring the labial =iosure ard therefore has eini*Mj acoustic 
consequences. the other nane?, if the first stop is labial, although an 

alveolar or ve^ir closure say te established before the lips ar^ parted, We 
labial r#i«*«* s it occyrs* mi:: generally p-eddyce a b-urst because t^ere 

t* no jcrrl-ision sr-t*rier to the 1^5 T> 4* occasion^; absence *f a J^tectaole 
tsjrat be 3t#? to ranging lo^il conditions ie-g,. dryness of the lips) tftal 

sff^t so«n3 generation "Am one stop is jWeolar anij tne other *eiar B *r 
sus* take mi- acro^.* tnst the s^se art:*«jlator-*t .e 4 the to- ^^e- ~I 
invd^vwi £»et *^-4gb. ^n principle, th* Mtfgwr tip could estuDUSh contact 
*i|lh In? p.i.at<- bef^r* >r.s tor-gy* body release? its etntact Cgsd vice versa) . 
•MS ifees diffiC^i* 9*n«uver tf?a? >ft^f5 do nc* coe^Oflly «ploy Obr 
4ata she* that r*ie*5* t-^sts oriwr both in aiveoiar^veiar and Y*lar~ft}veoUr 
se4w*?r.r^5, Suggest * r>, that second ?;->ur*t ;^ established shortly afl#r the 

rt-ieasi* r J*fjt :r t*i* -i-js^f t-fid* >f two steps o^fUp^. 



f*f5» ioh^J9crg*M. h ?v*:«5l:p ^cq^r.?^ t ur>f ! jSe-3 

* r a f * * *" that ; ^i#»f ^ *•» : -r # cn ^ * ,-t ^ ? absencr of 'lease bur St * a 
**• h**?? ? v%jn^ L K dit 'i*>^s»' r-^^t^ i^tf;;, present , ^* i^ast sn tr^^e 

* *\ .^^y *_*5t !-tf^.-»fs* 4 ^ in l/igiisn ^..f v t*voS« *r* iiti|--r: 

m*\kl j e---» ftr.^rr.U'r ?^t* :-T(fcftr^ Mf^ o! * *>* st^p se^^fifes arro$s i* 



probability of jrrenc** *jf 3 release burst ;*f the first op A** f twt<- 
Stop s«qv>eftc<*s *r *or3~fina2 position, w?>;c* are typically :lteS ir 
discussions :f '^.release-:*'* stops, our data show that reUdie £irs*s A tne 
first Jvtup are *^t jrtiiy acre frequent than \n word-3»*4»dl position. 

Although sense authors i Abercro^oi e , *9fc\ Jones, "956, aertLlc&rJ fair.t 
^el^uSe bursts, it is cor IfcpresSlon tnat their occurrence has b*e' 
generally acknowledged, reason for this &ay be tnat they are aifil- ^. t 

detect t-f ear w> >-.jr<luCte2 a brief e»per|?aer : ? H^d^eSS this ;35uv. 



*etr„uj 

FWe typ;ca; jtt er^ces «r r - chaser* fr^ speaker prifdvrftlcns. a*, 

contain ir. release bursts of th- ir*t 3\vp 1 ^Uus. rjfrcafle, tdg^r . bgdgjig « 
g^apego^ l *\ ' ^slr.g the Has* las Laboratories pulse cod* £€*JuIaU^n 3y3*e#. we 
e*eerpie<J the words fro& their sentence context and then created a seconc 
•-ersi^fv 4 ear* f <**ich the release b*rst of the first stop was rep*wced wit?-, 
silence, ^ig^re *b shows this modified /ersi n of l he worj scapegoat, the 
^rigir-al "f which lir displayed in fi^.re 

*e t^n constructed t Siscriai^atlor. testa In tr** tea/lip t st t each 
:f the ten st istul I vcc*#r*vd »en tiaes \ r* random order, with 1 nter at i&ul us 
inter *<**s isr^ = of • , In the 2IFC tgat i two- interval f or etd~choiee 
test;, the two versi^S j? esc? word were arranged 1* pairs, with the aodlfle" 
verst -r. **.?r t <?^ fi*at or second. The resulting ten pairs oc ^rred ter. tiSB« if, 
r^r.lvr rri^r t t * r : !"^s S>j Ese-^ wltnln pair* ^ se; between p*itr*. 

N.n** 2 *n ;e *s participated; tr.e/ were the t^.. authors i sev**r 
- - * * e d!^ ^ 3 _ » Ha ; n s La bar a tor 1 ** 3 with v ar y 5 nf aE & *_ 5 _ f pnone 1 v t *- 3 \ r . : g 
a ei p*r ; *: * n e 2 / N o -41 ?,<: r I a t r. ^ 1 1 or. t es • . t he > we t e pf ^ / e C ' t ~i 

rfrat**?r. : •he ^an^oeii^ 7 t6*ens m-i were asic*--3 t« ;r;-^ 5 wtietner e**.r 

sti*^.us Jl.-' ^- f . ^ohtai r ^)^?se b^rst of the l\r3*- ^onsoftart in *f;*- 

twj-str;* iii^vf., >. *he stits*»^$f:t ?IF* tes* . •r.e tutje^ts j*3Ke3 tc 

.ISt^f. ? pti f ^ ;f w^r^^ ar.d then ir.^i.ate wh;'^*'. s#^ber , • _ n t- fl^st the 

St or. :! , - ? » 3 ; f: 1 * • e I f ^s e ur a » T> *^ ?• }e: ^ 3 we e * . 1 * ha t * _^ t jt 3 1 ^ 
**gr? te u?!> * -. = e-.sr a< . l^te**--* t- *^ ^ - v^f .it-i*'a* ^ r 

t f r ' 

* ** s ^ - * i 



T *; e ^e * ^* *? *" e r . * 


44*-^ 






r ♦* .** p* J f ** H tf* . ^ , i *jf - 


* J * * > *- 


t ► r f. 


five « - r i 5 ; i 3 p * a y** 
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the release burst and its temporal Separation fro© the much stronger release 
burst of the second stop. 



Table 3 

Hear Percentage Correct Discrimination 





Discrimination 




St 1 arj 1 i 


Yes/Ho 


2IFC 


Cactus 


49 , 


5 1 j.O 


R;ocage 


61. t 




£dgar 


62. 2 




Bod* : n 


6** . «4 


62.2 


Scapegoat 




80.5 


Heart 


61. C 


6^. 0 



There was also considerable variability between subjects. In the Yes/No 
test, the two authors performed at 83 and 85 percent correct, respectively, 
*herea$ the scores of the other seven listeners ranged fro© **5 to 66 percent 
correct. In the klfC task, the corresponding values were 89 and 79 for the 
authors and for the other subjects. Thus, if one excludes the two 

subjects who had pre-eiper imental experience with the stimuli and perhaps knew 
better what to listen for, there Is littl* evidence that even phonetically 
trained listeners can detect the faint release bursts of so-called 
"unreieased* stops. This is. then, the likely reason why the bursts were not 
noticed ty woffle earlier authors who relied on their auditory impressions. 

CONCLUSIONS 

In this pap^r , we nave reported some <?ata relevant to the statement that, 
in £nglish, stops followed by a different stop are "unreleased ." We have 
eia»ined several possible interpretation* of that statement: O) If it is 
interpreted as referring to articulation , it is clearly false. (2) If it Is 
interpreted as referring to the acoustic signal , it is not generally true 
unless the definition of what is to count as a "release burst" is restricted 
to acoustic events of a certain «ini»al duration and amplitude* While suc<i a 
restrictive definite aay have been implicit in some previous discussions of 
" unreleased* stops, it should be noted that, on the contrary, the term "burst" 
xs appropriately applied only to the signal portion excluded by such a 
1ef inl*ion~ #i*. . to t*» brief transient generated by the stop release, 



exclusive of any following aspiration ( c f. Dorman, Studdert-Kennedy , & 
Raphael, 1977; Fant. 1973). (3) If the statement is interpreted as referring 
to perception , it appears to be accurate in so far as stops preceding another 
stop in conversational speech have release bursts that are difficult to detect 
by ear. In this sense, the stops in this study were indeed "ur eased. " (M ) 
The possibility regains that some phoneticians have used the term M unreleased M 
in a purely contrastive sense. In this usage, even a stop with a detectable 
release burst might qualify as "in released" relative to some standard for 
"released" stops. The stops recorded by Repp (1980, in press), whose release 
bursts were from 10-40 msec long and quite detectable, may fall in this 
category. An obvious problem here is the absence of any clearly defined 
criterion separating the two classes, 

These considerations illustrate the confusion that can result from 
terminology that is not only vague about the level of description to/which it 
refers (Repp, 1981), but also insufficiently defined at the level /intended. 
Many phonetic distinctions th§t are couched in acoustic ;erminology have been 
drawn at some remove from the speech signal. In that respect^, the term 
"unreleased" is similar to the term "traspirated , " which is commonly applied 
to consonants, such as English tg], that exhibit a good deal of aspiration in 
the acoustic signal. While these terms may be sufficient^ for the field 
phonetician, they do not reflect the level c detail \that acoustic 
phoneticians are concerned with, and therefore are oi limited usV 

We propose the following, more detailed classification, in which 
"release" is reinstated as an articulatory term: * 

(1> Unreleased : The occlusion is maintained, as in a stop preceding a 
homorganic stop or in many utterance-f inal stops with delayed release. 

^ ' Silently released : No release burst in the acoustic record. 

{4 <) I naudlbly r eleased : Visible release burst in records of the signal, but 
not readily detectable by ear. 

'«*) Weakly released : Release bur-" detectable by ear but clearly weaker 
than in (5). 

(5; Strongly released : Release b--st is followed t>y substantial aspiration 
or voicing. 

In tms scheme, successive classes are separate*! by different criteria: 
Uj and (2) by an articulatory criterion, (2; and (i) by an acoustic 
criterion, (jj 3nd (4; by a perceptual criterion, and and '5 J by j 

criterion of phonetic contrast or classification. 

In summary, our studies indicate that, in English, 3top3 receding a 
nonhomorganic stop in cor# ersational speech are generally released inaudlbly 
or silently, silent releases being particularly common when the following stop 
is labial. The observations of Repp (1980, in press), on the other han6 , 
suggest that similar stops produced in isolated disyilabies a, typically 
weakl y released . 
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FOOTNOTE 

^e considered the possibility that the absence of release bursts in sosre 
tokens was aue to the substitution of glottal stops for alveolar ^and, 
perhaps, velar) stops. In tnt informal judgment of the first author* 22 
Utterances oay have contained, glottal stops* In 18 of these, the putative 
glottal stop preceded a labial stop* Release bursts were ob^r^^ in *• of 
these 18 tokens (22 percent), which is slightly higher than the overall 
Incidence of 17 percent in this tontext (cf. Table W. Thus, to the extent 
that glottal stops did occur, tftey did not change the pattern of our results. 
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The incasaa.it burrowing of the new-born p:g* had turned their p igpe n into 
a hug* *udjjuddle. 



/ f /. On* of Deborah 1 n fa/or ite nobbier tspdaneing, especially to jazz an'! 

/]. fio*e p*opl* olaim tnat Ninon wan only a scapegoat in the cover-up of 
C,i,A, vhewlng and subterfuge, 

''r% Margaret • aught her > year old v,n drying to shoot a mijj^je with hi* 

✓ 5. In tnr Mil, * h* catfcini hanging '/utside the backdoor of th* cottage were 
re^i i y beaut i f u« . 

^ ,4 . Hy grandfather always innfrU'J a hatpin or a bodkin into h^r cakes to 3ee 

if *ney were re*dy to be rw>vM from the oven, 

-'S. 7? } e «&arin* os oiogi nts made a movie about th** development of tadpo le 5 into 
f^ogi tnrough a trapdoor mechanism on the side of the artificial ond . 

rU t joifojii^ for th* Oovernor and ^uhgovernor of India during the early 

I'/iu 1 * f oi d only be differentiated by the h end gear and the collar 
mar kl ngi . 

si l* see*f5*'d tha* *oe «ner;u for bo'ittaffip «onsi.nted largely of jpotjgies and 

;>»/H tried preven* ?h« bogdown of hi 5 ? ar by putting nack? under the 
wheein, ou* af*er a few «*t*fflpt*i at moving it, it lank up to the h ubca ps 

; h * te , 
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OBSTKUWi PRODUCTION BY HEARING-IMPAIRED SPEAKERS: 
INTERARTICULATOK TIHING AND ACOUSTICS* 



Nancy S. McGarr* and Anders LOfqvisU* 



Abstract. This stiidy e*awineo the organisation of laryngeal contr J 
and interarticulator timing in the production of obstruents anJ 
o&st'«ient clusters by three severely-profoundiy deaf adults. 
Laryngeal activity was monitored by transillumination; temporal 
patterns of oral articulation (lips and tongue-palate) were recorded 
using an electrical transconductance technique * For eacn of the 
deaf speakers, an inappropriate laryngeal abduction gesture was 
often found between words* a pattern never observed for hearing 
speakers. At the same time, the deaf speakers differed from each 
oth-*- with respect to type of errors, variability, anc interarticu- 
lator coordination. For the most intelligible speakei t the timing 
of glottal opening with respect to oral articulation -*as most like 
that observed for normals* The second deaf speaker of&en failed to 
observe voicing contrasts with respect to giottai -* pening. This 
subject was nevertheless consistent in producing most plosives 
without a glottal opening, a-j ail fricatives with an opening 
gesture. For tftr third deaf speaker, the pattern of errors wa3 more 
eofcpiey and incited Doth hissing and map ropr late g ottai opening 
gest ure* 7 , 



Frcducti^n voiceless obstruents requires intricate coordination of 
«<*verai »*ticuiar.ory systems. At the laryngeal level, in abduction/adduction 
gesture nomai^iiy occurs to stop glottal vibrations and assist In the buildup 
of oral pre^f.re, a Laryngeal adjustments are also necessary to produce a 

"losu/c or constriction. Thus, laryngeal and supralaryngeal articulations 
involve simultaneous activities that must be temporally coordinated. 
Differences in the reiaMve timing o( tne laryngeal and oral gestures ar^ used 



•p*rt^ if this paper were presented at *n<i *}ist Meeting if tr;e A 7 * oust i :a" 
iio6xely of Amsrica, Ottawa, May 18-2^, : 96 1 . 
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in «* wide variety uf language^ to produce contrasts \r. ,*ri ; r^;' j 1 ; * 

u^f. Linker & Aoramson, 1964; 1.51 4V is* & YoaMcrfa, : VfT > . 

Since the larynx is placed in an inaccessible and invisible pusitiu*;, it 
15 reasonable to a33une that coordination uf interar t iculator gesture? i", 
learned by auditory monitoring of the acoustic signal, Developmental stud it* <* 
suggest that children master sound contrasts requiring laryngeal ad jusuaem a 
(e.g., voicing and aspiration) by attending to their acoustic and perceptual 
consequences iKewley-Port £ Preston. 19*^; Ziatin 4 Koenigsfcnecht, 
Gilbert, Macken h Barton. ?98U>. These studies also show that obstruent 

contrasts emerge relatively late in children'* speech and that production i$ 
more variable in children than in adults. The acoustic cues for obstruents 
are complex, spread over time, and involve differences jn *he 3ound source and 
the spectral :ompo5ltl^n of the Signal. For example, m the production oi a 
voiceless fricative in a vocalic environment, the sound source changes fr.v? 
pt* r I od i r i o a pe r i od ; . and t ac k t o pe r i od 1 c . : ? i mi i I ar I y * a v» i o#* 1 ** \ 3 a sp \ * a* 
stop in the same environment is associated with tne I diowmjj sequence uf 
source 'ha'iges; periodic voicing during me pre "di-'.g vowh : , sii^nee Hiring 
tne cl^sori-, transient noise, aspiration noise, period*: voicing during tr.e 
vuwifl . Ir, ad-Jit ion to oeing spread o^t wer time, the acouStA* attribute:* of 
obstruents outers involve short-term spectral changes, *her*» hi Ah frequency 
components pi*iy an important ro;e. kxanpien ;)i such ittribut**:* ar* releas* 1 
b ur s t s and t o rm an t transitions ; : r 3 * _ £ . _ n i > i , if * t* , *f . I sp**-' " a c * ' ** ir« - 3 . ; - 
tions f'^r fricatives. 



i * / * i tr*e :j*!jplejt jrt ; ^lat :ry ar* J d^.j^! J : - per t j * •nit^*** 1 1 o 
i e 5 *> ob 3 ? r ue n t s , one we ^ 1 1 e< pec t he ,i r ; r«g^ » *3 pa i * :ip«- a*e r h fc > ?. a *- j^r t : : * 1 <» r 
probi**ms with tms c*as.> of sounds. t\? indnerj tr.e . *tse, a^ s*»uwn 

several descriptive and acoustic i*udi*»s. For ei^p^, r^ar *'*g~ ifipa. * e J 
speaker s f r equen t : y fail to make the vo I *: ed~ vc * - e i e s n 1 : s* i m-j 1 i c - Hud » . 4 
Number;!, Ko, In some studies, tnis subsiltut io?. in re^rte: is - irr;v 
to the vjlced member uf th** pair Heiier , Hn ter, ^ '-vices . * i + * , a^* , 
Nilitn, '^''l; .Smith. I'j'Si, and at 4 1 .h ? i«fs t.i 1?.** -v r.gn^t*- 

Kangan. Nnber , 1^ ' ; M a ricMes. At the a e#*-. se-vera* 

studies r **p /rt j .a- K uf »Ji e .^nse' t iMttn^tioti J ^ Se t! 3p*»ant» 1 ',- 

Hi^nsen . MaJ.sMe, nnd ai irN rn**t p *^sjre or ^'.*". % t: n ; # rat; . f 

13 different lr*m notzictis 'Jawe-'t, * * ; »sberg**r ^ L".»t? , 
Production ;f ibstruent :Iu^>r»? ^3 r^^, par* S -**ar*y Jjtfi -j.'. f^r ?»*a"ir:g- 
iTSpairHtj 5; ■'jk'TS Hudgms h N^b**rn, ' ; b r an»--r . * * f • . ^'^ 
Reported err r pat*efns f-»f tM«se b***rriH . „-Je t ne Jr_'p^;*.g f :f>i- r ^ f *• 
cjin^nenti ;f tf:** 'i-^st^r. r o,** alisrg jr* al»** ,? *t m , » 43 ***»gr:»'*j* a > ;.i . t : 
* fi<M n^,/ K r ( 'j ( . between Li:*-" ** , ***ser-f s . k?*' n j -i'-R . r- 

*^3ter pr -d-*» t : fia^e tee^ 3K-.wn t-j fe t i.. ^ * 1 .g : r ; , ' v : 

\* h\ 3re#«-r. ;^l»>t c : j3* y Mjlgiri^ ^ S — . * 

WT. ; .e »•**' ";ay j.res;f!ic t* lr i* ; -i h { s v 1 * 1*"* " ..a' * 

j • Si'«a?»~'fi trat .eais * - *. h**Sr* ^ er »;* el * .» j:** ; i^^/i. ' 

^>33i:..e * f 1 * *i te ^j'. ,rf «: **e , ^1 wr.g**a* /*'sd 'O^ f . y f A*'** * 

eveo* 3 fr ^ *^».»* -f - ,n* ; n- »r # , It** ^ **e \\.> f i * . 

tr,e r *i r**, ^a* 1 *. je!rf;»#»2 ! . : c. r - : j? . ». p»»* >* * .. < f * * 

events 1 u /#»•;• ^r.c s ,r , ,«*r.« : • . * ^ . *~ ; t* . ** *- 
•"ie^r : r »g- i"s;*.t ; * *• ! i r 1 '• *d! j *-arf* 4 ' " 
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* 1 1 * ** ' 

The Percentage of Cc^re^i Productions. th#* Percent f ; r* # 
*nj in* Frror jt^oru^ Are Sh;>wr. _ 
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.e*3l •-ftrS* trj^M*^ If-* r. >.., ; ^ : it j**,.! * $r. v 

a31-M ■ *; *.d'. Swaps ttsMrs ?<>af-i#5* . 

V*- *: <sse;>;»3 fr<JS *ac*i of ? f.<* **«ar ir.>t- r r 1 iicf?*3 ^r*/ ir.-i; ,**#•* 
*- *« «r»ya Rrv„, two JiV-cnef s : tre ay! ft.-;r * 1 ~,*J<* Oru*i tr.cr.eii/ t-fj'-- 
Krjpt ;f *r.P icM* p*o<Suz«3 by lt\* $c?t S^&ji'C**. JTf parties.** 

tnttrvat ^ot^ir.g States of tftf* cfe»*f^*nts ana cfcslr ^*«jiterj V 

iJWrff 5 Scrsf-sJ. -H^ ? =-" 5«spi*5 *r*r*? f4l*<J fc?r / v***- jil * i*»K-LJJ- 

giDiSity try ,} \m^*rr r <*r highly p**r ! one wltn * .jraf fr^i icviitg *fcr for*.** 
5 r.**;n* fvf jnt «*i I igi fcl * M y fubtriny, d*?a- r speaker - :i f j;i 

^ - ; { *$ ; r.»* i I s g 1 £4 f* the exception ,5 a vcrdA or ^r^*. 

T* * lp*e-:ft if i*f4f 3p*4«*f 2 0* .h^^:tffUe1 t*5 i:rn?„il -r..3«K5T 

2 * -_• S ** . ** ■* ' ot- ntfvff^S. ft«-i«5 ft 1 5*; •• f sf -js'* -^> PJ ^ * 1 »- •* t * •** t r A i * - 
t .jel"dtL n ^^.r^S a f^?<* passage Icr ine frcae the f I t-^r^cop *> 



Yr ,ce3wr e 



l*rfK%*nl a-».vi*/ w»* ?vf;t*red i> tr ar.n U I '/si r. it I c-r. "onesscn. -V(>^. 
A fa»i;t;? firtry.^ 'nser »eo througr. f n#* *^3e and hei3 ir. position by a 
neadbjind prc*ijed ; , ; jflinat ten cj ?ne larynx, The amount of light passing 
♦nrowg** the gU»t;s was sensed &y a ph^totransistor placed on the surface of 
the ne^* jj3t the cricoid cartilage and coupled to tne 3kin by a light- 

*4ght enclosure The » ransU luttinat ion signal was recorded on one channel of 
a cult; Jeanne* instrumental icn tape recorder. During the recording session, 
r.e v|*w v-t the iarym was ssonstored through the fiberscope in order to detect 
".5 and figging of tnr lens 



Te*p~*"a 1 srfcrsatK-n c*n laryngeal art i^ulatory rjovement 3 attained by 
t ranji; ; unseat *cn haa fceen shown to be practically id* .,tlcai to Similar 
informal! r ^!>ta*r.e:s tsy ftr.eroptir fusing f the iaryns .Yoshiofca, L6fQVi5t , 
% Hircse, Lfifqvist 4 YosMok*. -9to:- Transi 1 luni nat ion is thus an 

ei.el.er.! U 1 ? ; r studying laryngeal behavior in speech. It nas a better 
teepvrjl r?ir.;"/sof. man fiberoptic filling and vidtro recording. Data col iec- 
tjv* 5n** **r«?*»3*;?sg are -pi-* ami easy. sns larger amounts c-r material can ne 
*a*-«:ej tr-as: any .-tner method jvaiia&ie for laryngeal investigat lens, 

7**npL r .* 1 £ a t i e? r. s I :* J* a 1 ar ■ i -: u 1 a * I ~ -n *#e r e recorded y 5 X ng an •? I e : t r 1 c a I 
* ? af.j.r-r-5 ^l.rce tezhiv.-j-je rf. £«ri 550ft 4 ^crd, i^'-;-, Tne electrodes of a 
*-Uf;e1 laryngo*raph «ere piar^d ?n tne apper and lower 1 ipS; respectively, 
r=s*t and ff^et Sip of tongue-palate con tact c:-jltf the?. Oe identified frr>ss 
ir. the e^e^tri^al signal, *ignai war? ^eccf^e-t • f- another onannel 

-« ^,^» - ,-«.♦-* at ic:: •e-rerler. 

n v ** r- * . , r a . ,i . . . 3 1 ; -: " e _ -j,r i l n 4 * » f * r ** 1 ? a 1 n ed ^ ; ;si# L * it n «* o w 3 * y ^s : ng a 
*^v: r . ; -r-i# l p ;v«» si^rtfrc-r.i* Tne voice signal «a:? recorded ;r. 4 4 rec? .node 
" ' h *. n f "p^ir^r. A ^* he $ : r ; t-«* ; 3** ^ ve . t ne ^ #• . c 1 i ng 3 .• f a 1 1 p r nd u\: u i on a 
lv ' * * f"** if f i -**:JjAe*3 w**re B ?».^r ^;e4 * >tt-i! n ; ; 3t<*n*-r ; figment 5 . 

? , " »*a-? ' * <* , ? /? .* * -J T**-tS'jr#»»<k?;t » we*"t: ^aOe rv^v^r.J ' ^ *t*z\ ^ 
*\r \ «f,ifi .^v.t? y .."jjrf ,.tr J •-*r.st rj i-;n Juration wer rate frc^n tr.e 

•*i*r.s, m.f v.^;.i\ 'frjr •srf/.i:.; :atia; --r t 5ng^e-; ,f ; ate »nta*-t. 

*^^.c^; n m >f.'.e^ set .=5 1 at : a , - I i»n^ at *s ^-"ntact , **%eas^ 

* J *r fi Set ! * » « f ;.**f;r_;r£ 'h»f5e I* r -My 31 : 1 Cg4 : a 1 i «*r«r5 *3 

; a* • I - ^. a r - 1 'c.J , * *-»■ j*?**- f 5pea*e: r , closure ?r '~onstr; ■ - 

*■ * -* a* - " ** * { 1 , . • i : riMV.;n» ; - .ii i^U: wf*v^f.:rr, r 

* * . rf ->-^e^ + ^ l ;- J ;ni- r 1 the ^.r?r,-£ >. : pear. ^-U* i. >f.er,; ' * fH^Vf.J 
, - t w (k pc-«r! T* r .;S ^-int ^v^j •»*.e ^^cj f j tne anjurti^r a? J tr ( e 

-vg.-.t ' - .»e ajj^ *.; *; .! tf.e v V * . ioli^, ar.J ;s ter r^>to*' t r,'. ; . 

;-^! f ^* ; ' ? * * -. S*eriCf ~ri r m /te* r ; : af.J tf'* ;f * er«r yten. : : 

-* .f " ? ** r\r ' J? I-*:! a. -;e.n;*;g ?t^g: * Y i> J > > *e^. t - 4: Str 

r « ^ e , T,3r,:i»a, ^ *il ;~; , 1 ^ J ; « k n * *i * 5M-'»<ca, 

7'.* . * ; »*3t » *e « ; »* iif * gl^*T r i, * t»* ; ^ i- f . ; ;** 

x * . i ; *- z : . ii f * * ?r e ~t . 4 - r * '„ , f ! ^ #»** . f M 
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was calculated. This measurement provides an estimate of the relationship 
between onset of constriction or closure and the beginning of the adduction of 
the vocal folds. It is useful since it highlights differences in timing 
between obstruents, e.g., stops and fricatives (LBfqvist & Yoshioka, 1981), 
A second measurement of interarticulator timing was the interval from peak 
glottal opening to offset of labial or tongue-palate contact. This measure 
shows the relationship between onset of glottal adduction and release, and is 
particularly useful in examining timing differences between different stop 
categories (Lftfqvist, 1980), The physiological measurements were supplement- 
ed by acoustic measurements of voice onset time for stops. All measurements 
were made interactively on a computer, v 



RESULTS 

Sj n$ie Obstruents 

Figure 1 shows representative tokens of the hearing subject's productions 
of voiceless and voiced stops, A glottal abduction/ adduction gesture is seen 
in the transillumination signal for the voiceless stop but not for the voiced 
cognate. Patterns of interarticulator timing are noted in the relationship 
between events recorded in the signals representing labial/tongue-palate 
contact and glottal opening, respectively. For the voiceless plosive, peak 
glottal opening occurs at the oral release, indicated by the offset of lip 
contact and the release ^burst. This pattern is the same as that found for 
other speaKers of American English (Lttfqvist & Yoshioka, 1981), 

Figure 2 shows selected tokens of the same utterances produced by deaf 
speaker Several patterns are different from normal. First, closure 

Juration is considerably longer for the deaf than the hearing speaker's 
productions. Second, there Ts evidence of an inappropriate glottal gesture, 
Tne deaf speaker made a glottal abduct ion/ adduction gesture immediately 
preceding the test word, before the onset of lip closure for the initial stop. 
Thus, for botn productions, glottal adduction starts before lip closure, and 
tne fUottis is in a position suitable for voicing at the release of the oral 
closure. The abduction/ adduction gesture between words was fairly typical of 
tne other deaf speakers as well, but was never observed for the hearing 
3pea*er * 

From tnese raw data, a number of measurements were made that are 
sunmanzeo in Figures 3-a and also in Figures 6-9. Line 1 in these figures 
snows tns mean duration of closure of constriction. Line 2 shows, as a 
histogram, the number of instances of a glottal opening associated with the 
obstruent production. The third row shows the first measure 6f interarticula- 
tor timing— the interval between implosion and peak glottal opening. The 
second measure of interarticulator timing is the interval between peak glottal 
opening to release, indicated in numerals oeiow the third row, A negative 
vdiue implies that peak glottal opening occurred after the release. The 
presentation follows our general impression in rank order of overall speaker 
intelligibility: .15 the hearing speaker; (2) deaf speaker 1 (felt to be the 
lost .lit"* *:g»blt? leaf speaxer;; [$j deaf speaker 2 t and in) Jeaf speaker 
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Records of the hearing 3 p eake r* s production ol the ^iterances 
pea. t left), and "beak" (right). Curves reprint labial /tongue, 
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.botton). Onset of labial closure for i™ *ord initial labia; 
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closure by A . The vertical Ur* indicates t*e i«* at *ucn WJk 
glottal opening occurs, 
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Results for the single voiceless and voiced obstruents are summarised in 
Figures 3 ***d 4, respectively , Closure or constriction duration was always 
longar for the daaf subjects than for the hearing subject, consistent with 
previous reports, As is typical for fearing speakers, closure or constriction 
duration was longer for voiceless than for voiced segments. For the deaf 
speakers* the duration Measurements for the voiceless and voiced segments 
overlapped isee also tmlow, Figure 

Th* n»ber of tokens for Which a glottal gesture oceucned are shown in 
"line 2. These gestures were always correct tor the hearing speaker and deaf 
speaker u That is, for single voiceless obstruents, each token charac- 
terised by a single abduct ton/ adduction gesture; for single voiced obstruents, 
there was no laryngeal gesture. For the other deaf speakers, tiM pattern 
varied. E%af speakers £ and j used an appropriate laryngeal gvslure -more 
often for the alvaolar than for tfct hUahial obstruent^. *e will discuss the 
voiced obstruents of th#%f speakers fcelow* 

With respect to mterertieulator , timing, both the hearmg speaker and 
deaf speaker \ showed nearly similar patterns for ail segments. For voiceless 
stops, the interval from implosion to peak glottal opening tends to be similar 
to olosure duration, Thi* means that peak glottal tuning and orjl release 
almoH coincide, TT?us, these two speakers both shov a small negative nuaber 
for the second measure of interarticuiator uming 4 *>, the interval froes 
peak glottal opening to release, Fven though the aur*tlons for the 

d-*f speaker are prolonged overall, the relative • f oral and laryngeal 

g*!4tures is indistinguishable from normal , Jor x v-eless fricati ei of 

these two speakers, the interval froa implosion t k glottal opening is 
roughly half of the duration of the oral constrict* .n, Feak glottal opening 
thus occurs about Y00 msec *>efore release, 

Deaf speaker * was inconsistent in production, since in most case^ there 
was no active glottal opening gesture for the stops. For the fricative, ther* 
was an appropriate laryngeal gesture and inter articulator tiaing *as more 
normal . For deaf speaker we again find ar\ inconsistent pattern, For t"*e 
labials, there was no glottal opening, whereas for the alveolar*, a glottal 
opening gesture was aade k The mterarticulator timing in these cases is 
similar to normal , For ty, the glottis* did not begin to close until about *s 
msec after the oral release, which is somewhat long, althov^h not totally 
unusual, For the fricative, although t K e durations are long overall, the 
relative Mmmg pattern was similar to the pattern obtained for nomal 
speakers, * 

Usually, one does net discuss iaryngejel-oral . coordination fcr voiced 
obstruent production, but since deaf speakers are known to' produce voiceless 
for voiced segments, we have ^Iso examined >these productions* Figure *§ shows 
these data. Here, we again find evidence that deaf speakers . aiay use an 
inappropriate laryngeal abduction gesture for the production of soese voiced 
sounds, hut as before, the speakers are inconsistent in this aberrant pattern. 

When the deaf speakers produce the appropriate laryngeal gestures for 
voiceless stops, their overall pattern of interarticulator timing resembled 
that of normals. Specifically, the ora^ release and pe*.k glottal opening tend 
to correspond m time. For fricatives, peak glottal opening pr*?ed?$ off set 
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of tongue*pelate oontaot aa has been obaerved foe ncraais. But • rather 
uneapeoted finding waa obtained for theae deaf subjeots. In general, the 
laryngeal gesture fcr the voloeleaa friostive /a/ waa produoed oorreetiy aore 
often than for the voloeleaa ploalvea. for exaaple, aa shown in Flgurea 3 and 
*, deaf apeaker 2 consistently oontraated atopa and frlcati.ea at the glottal 
level— the foraer were nearly alwaya produoed with a closed glottis, while for 
the latter, the glottla waa alwaya open. However, aa ahown in Figure 5, the 
deaf apeakera were unlike the noraal in that they were highly verlabie in 
their production fron token to token. Standard deviatlona for the deaf 
apeakera were, in aany oaaea, fairly large. Fcr the hearing apeaker, the 
standard deviatlona were quite aaall~-on the order of 10-25 aaec, and 
therefore not inoluded in the figure*/ 

For all teat worda deaorlbed above, obatruenta were produoed in the word- 
initial poaition. an allophonio variation in aaerloan English ia that 
voloeleaa atopa following a atresaed vowel are unaipirated. Therefore, we 
also exaained atopa produoed in two different poaltiona of a biayllabfee word— 
"paper,* where p, ia atreaaed and pa ia unatreaaed. Theae data are' ahown in 
Figure ,6. The tialng pattern for the inital atopa in thia teat word waa 
essentially the aaae aa that deaorlbed above for all apeakera' production of a 
aingle voioeteaa atop. For p~ the pattern la aiailar for the hearing aubjeot 
and deaf apeakera 2 and 3. Closure duration Mia shorter ia these oaaea and 
there waa a tendency not to use an abduction geature in production. However, 
deaf apeaker 1 produoed both initial and aediai stops in, an alaoat identloel 
May, with aspiration in. bath oeses. 
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Heesuceaents of Vo|ee Onset Tiae for Single Stop Conaonsnts (aaec, n«6) 
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Table 3 show* measurements of voice onset time for single stops. These 
acoustical aeasureaents aatob fairly wall with the physiological data, i.e., 
voioe onset tlae was generally longer when a glottal gesture was found. 
However, in contrast to the physiological data, the standard deviations for 
the acoustic aeesuretaents were fairly saall. 

- Data for affricates are shown in Figure 7. these segaents are known to 
be particularly diffiouit for deaf speakers to produce. Por the hearing 
subject, the stop closure and the fricative portion of the voiceless affricate 
ware 39 and 126 msec, reapeotively, with peak glottal opening occurring during 
the fricative portion. In contrast, for the deaf speakers there was in aost 
oases no atop coappnerit. Consequently, the tlalng pattern reseabied that of a 
frtoatlve. ill deaf speakers produced the voiced affricates with a laryngeal 
abduction gesture. 

Clusters 

Clusters have not been studied auoh in the speeoh of the hearing 
lapalred. The ooaaon /st/ cluster ms ex Mined m the word Inltai position 
and in the aedial unstressed position of a two-syllable word. Figure 8 shows 
only one ooaponent of the cluster since we were often unable to Identify two 
separate gestures for the hearing-Impaired speakers. Consequently, these 
productions aostly -resemble patterns described above for the single voiceless 
fricatives. For the hearing speaker, when a voiceless unaspl rated stop 
followed a fricative, peak glottal opening is tlaed during the fricative 
segment and the glottis begins to close before the stop ooaponent begins. 
Deaf speaker 1 tended to use a tlalng pattern for an aspirsted stop with peak 
glottal opening at release. In soae oases, two opening gestures occurred—one 
for the fricative and one for the stop. For deaf speakers 2 end 3» in aost 
oases, interartloulator tlalng for the word initial cluster acre closely 
reseabied that observed for single fricatives. These tiatng patterns were 
slailac. to noraal in that peak glottal opening occurred during the fricative 
portion. No elear pattern eaerges for these speakers' productions of /at/ in 
"Jester." « 

we finally turn to cluaters with either a wore or morpheas boundary 
within the cluster, see Figure 9. In the first oase, that of the wor: 
boundary ("leas tea"), we would expect that the word inltai stop /t/ would be 
aspirated since aspirstlon here is a way of signaling that a word boundary 
occurs , between the /s/ and the ttt . In fact, all of the speakers, with the 
eiception of deaf speaker 2, produced these tokens with two separate glottal 
gestures—one for the fricative and one for the stop. The patterns of deaf 
speaker 2 are oonaisteHj with the previous observation that this deaf speaker 
produced most stops without glottal opening, although for these tjest words, he 
nevertheless respected the word boundary. The pattern of interartloulator 
timing is similar to that observed for other tokens of fricatives and 
aspirsted stops. 

t 

Turning now to the effeot of the morpheme boundary, the pattern for the 
fricative segment is similar to that fcr other single fricatives. For the 
stop segment, only the hesrlng speaker *nots evidence of a aeparate laryngeal 
adjustment. Deaf speakers 1 and 2 did not use a glottal opening. For deaf 
speaker 3, no stop segment could be identified. 
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, DISCUSSION 

Normal speakers (Alstently use different patterns of laryngeal-oral 
coordination ftor voicefiis stops gpd fricatives (Ldf ovist & Yoshioka, 1981 > • 
Onset of glottal abdugifon generally tends to coincide with onset of oral 
closure or obnatriotldif, unless preaspiration occurs, in which case glottal 
abduction precedes iraoslon. For asplrabed stops, peak glottal opening 
odours at the releasifof the oral closure . This ensures a delay In voice 
onset tlmo <md also e|*ws a ^ high rate of air flow for generation of frieatlon 
noise immediately aftmr the release. In fricatives, the pe*k glottal opening 
ocdbrs closer to th*Jbnset of' the oral constriction. The velocity of the 
abduction gesture is higher for fricatives than for stops and *,he else of the 
glottal opening also ^ds to be larger for the fricatives. These differences 
in laryngeal control \ and interartleulator timing are most likely related to 
different aerodynamic] requirmenta at implosion' and release for fricatives and 
aspirated stops, respectively. The hearing speaker in this study followed 
these patterns, V ' 

• \ * 

The deaf subjects showed both similarities and dissimilarities with 
respect to normal speakers. The a^Jt obvious dissimilarity- was failure to 
produce the voiold- voiceless distinction* The deaf speakers either made a 
glottal gesture itien none was required, or omitted the glottal gesture. 
Furthermore,* eve»< <phen a laryngeal gesture was produced, its timing relative 
to oral articulptohr events could be more or less like normal, this pattern 
varied considerably among deaf speakers. 
/ 

— Met^urprf singly, deaf speaker 1, the aost intelligible, closely followed 
the * normal paft^n. For aspirated stops, peak glottal opening consistently 
occurred at. ttp oral release. The smae strategy was used In production of the 
second stop id tjp* word "paper," although In this r case» the phonological rules 
of American ^glish dictate that aspiration is not necessary. On the other 
hand, while t^e /timing for single fricatives was often produoed correctly, the 
/st/ cluster* Jpiowed different patterns of interarticulator timing. One 
'example of tjkf occurrence is illustrated by th e /s t/ cluster In "steal 91 where 
relative tiMap was observed to be like that for an aspirated stop. Again, 
this apeake| uses an\aspirated stop inappropriately— in this example as part 
of a se^en* Cluster . 

Deaf gj^eker 2 differs fro* normal in still a grosser fashion. Stops 
were copsat#ntly produced without laryngeal activity while fricatives were 
usually ptfld^oed with air appropriate glottal gesture. For. these tatter cases, 
the iWterfryiculator timing was relatively correct. Turning to deai speaker 
3, we not| both Incorrect and highly variable * productions. However, when the* 
relative timing is preserved between the articulators, the absolute duration 
of afticdlttory events is longer than those found for hesrlng speakers. This t 
pattern of increased duration has often been noted in t^c speech of .the deaf 
(Hudglns; h Muabers, 1942; Calvert,. 1961; Osberger 4 Levitt, 1979)* In 
relatlonjto these findings, it is Interesting to note that hearing speakers, 
when deprived of auditory feedback, also show evidence of increasing duration 
(Borden J 1S8q). 

! ; 

Another characteristic that marks the speech of the deaf as different 
from normal is variability in production at the physiological level. This 
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variability appears to be an important factor in the speech of the deaf 
suggesting that deaf speakers, even* the less intelligible, do not product an 
utterance in quite the ea*e nay each time it is perceived to be In error, 
flower, we also observed triat even when speakers were judged to be correot in 
their productions, there waa considerable variability fro* token to token . 
These results are consistent with electromyographic data obtained for oral 
articulator timing (tongue ~ lips) of a deaf talker (HcGarr A Harris, in 
press). Variability in production was noted less at the acoustic level (VOT 
measurements) t although fairly large \ standard deviatlbns for deaf speaker 
productions , have been reported (Monsen, 1976). Such Inconsistencies in 
production nay be one reason why listeners find the speech of the 'deaf so 
difficult to understand. 

As mentioned above, all deaf speakers were more apcoestful in pnoduclng 
fricatives than stops. These results differ from those reported in the 
literature (tiober , M967; Smith, 1975; Levitt, Strombefg, Smith, A Gold, 1980). 
On the one hand, we find our results perplexing since one would* expect that 
'fricatives,, because of their high frequency spectra and articulatory invisi- 
bility, would be difficult for severe ly-profoundly deaf speakers to perceive 
and thus to produce. > Alternatively j on the physiological level, one might 
postulate that voiceless fricatives, for example, require less precise inter- 
articulator timing than voiceless stops. At the* laryngeal level, £he deaf 
speaker need only opefc the glottis, even if in a fairly stereotypic way as 
demonstrated by our subjects, and then direct the air stream In an outward 
direction* The distortion of the /a/ in the speech of the hearing impaired 
may thus more accurately reflect poor placement of the upper articulators 
rather than * inappropriate laryngeal ^adjustments. Indeed, it is well known 
that normally the /a/ Is produced at/the level- of the upper articulators with 
both channel and wake turbulence r the former being generated by the grooved 
portion of the tongue, and the latter generated when the airstream strikes the 
teeth. Deaf speakers are known to have difficulty positioning the tongue f<*h 
correct place of articulation (Huntlpgtou, Harris, A Sholes, 1968; McGarr A 
Harris, in press). *Plosivesj^ on the other * hand, demand particularly fine 
interarticulator coordination between the larynx and the upper articulators 
and more precise management of the air stream. . ' 

The operation of the larynx in speech is analogous to tttat of^an air 
valve/ whereby the valve must be opened for voiceless sounds to let some air 
escape, and must also.be closed at the appropriate times in .order to preserve 
the breath stream. Stud: of the respiratory patterns of deaf speakers have 
shown that these subjects evidence at leaa£ two kinds of problems. The first 
is that they initiate phonatlon at too low level *of vital capacity/ and, 
also that they produce a reduced number of s/U«bles P*r breath (Forner A 
Hixon, 1977; Whitehead, in press). A second' problem is mismanagement of the 
volume of air by inappropriate valvirig at the laryngeal level. # laryngeal 
valvlng has two functions; articulatory and phonatory. For the former, 
aerodynamic studies of deaf speech production do not consistently show that 
hearing-impaired speakers produce obstruents with abnormally Jhigh, air flow 
ratea (tftiitehead, in press), One might infer phonatory valvlng problems from 
some descriptive studies that often am&rlbe breathy voice quality to deaf 
speakers (Hudglns A ^umbers, 19«2; Honsen, Engebretson, A 'Vemuia, 1978; 
Stevens, Nlckerson, A Rollins, in press). The results of the present study 
suggest valvlng problems* of a somewhat different nature. That is, durlfiig 



:RLC 



102 



108 



pauses between nerds, each of the deaf speakers in tfis study inappropriately 
opened the glottis , Whether they actually took a breath, as is suggested in 
the early wwrk of Urtgina (1937). or siaply wasted air c#nnot be ascertained 

??£*? t3 *< Trm 0wr d8tt ' •*»*•*•**• m wgue that the latter is sore 

likely since the glottal abduction gesture was nailer and shorter in duration 
between hards than between utterances. This pattern differs froa one hypothe- 

b y st « v * ns et <in press). Based ?xi spectrographic analysis-3f deaf . 
children s productions, these authors proposed that the glottis is clqsed 
during pauses between words. 

turning to acoustics and perception, we find a rather straightforward 
relationship between physiological records and acoustic measurements for 
stops. .The relationship between the physiological measurements and the 
listener Judgments was not .always direct. Perception of both voiced and 
voioeiess obstruents could t* found for tokens with and without a correct 
laryngeal gesture. For example, for the ; p>oduotions of deaf spoaker 2, 
.listeners heard /b/ for /p/, the common voiced fdr voiceless substitution, 
when no glottal opening was found, of. Table 1 and Figure 3. However, for the 
alveolar stops of the same speaker, listeners reportm* a voiceless -sound in * 
all cases, incXud.ng those without a glottal abduction. Froa Table 3 it 
appeara that TOT was only 20 tasec for these stops. 

These results »»•«. not too surprising, since a straightforward relation- 
ship *-»tween physiology and listener Judgments is unlikely in such a complex 
pher^fcnon as the voiced/voiceless distinction. This mismatch between physio- 
logical records and listener judgments of deaf speakers has also been noted by 
Hahshie {1980), . although in controlled studies u? f ig synthetic speech, TOT 
has been shown to be an important determiner for the voiced/ voteless 
distinction, in real speech there are a host of acoustic cues that aay be 4 co- 
reaponsjbl* for this perception. Measurement* along, one single acoustic^** 
'dtoenslNa^cannot be rwedlly expected to p**oTct listener responses when other 
eoousUc .variable* are not held eonstara. since interactions hare repsajke'dly 
been s' own to occur. Ixamplss of, such interactions hat sffeot the perception 
of th vpleed-voie^iess. 'distinction in stops are amplitudef and duration of 
aspiration. (Repp, 1979), end speech tempo and closure duration (Port, 1979; 
Fitch. 1981* see also Wller, 1981), Our TOT values for the deaf speakers 
were in the' range of 2t>~- 30 msec, where interactions and boundary sJkifts are 
most likely tcoceur. This may be another reason why listeners to deaf speech 
have difficulty aaklng judgments of particular phonetic segments, . 

„ Earlier, we argued that because the larynx is placed in an inacossibie 
and invisible position, maatery of laryngeal articulation is arrived at by the 
acoustic ^signal . 'The deaf speakers in this study all sustained severe- 
profound heari??? losses suggesting that oral-iaryngeal articulation would be 
exceedingly difficult in light of reduced auditory acuity. In fact, deaf 
speakers *are often said to place their articulators fairly accurately Espe- 
cially for those places of articulation that are highly visible, but fail to 
coordinate tne movements between several articulators. Our data show tnat 
this notion of deaf speech is in part correct, yet our subjects were also 
capable of executing appropriate glottal gestures. We would argue that this 
is in~ part- due to low "frequency residual hearing that conveys some voicing 
information as well as tactile feedback.* 
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• There are other findings in studies of deaf speech that *re also 
perplexing and not satisfactorily accounted for by either residual hearing or 
taction: prepausal lengthening (Reilly, 1979) • and pitch declination CBreck- 
enjridge. Mote,!). If audited monitoring of one's om voice was the sole 
prerequisite, for the es' .olishment of these phenomena,, one t#ould not o^cessar- 
ily expect^taUTind the* in profoundly deaf spqricers. Quite possibly, they may 
be dj* to intri factors of the speech production system. This idea say 
also account for Why inter articulator timihg was sometimes correct for the 
hearing- impaired subjects of this study. Laryngeal articuVatory movements 
overall are rather stereotypic and restricted to abduction ana adduction* per 
esaaple, production of a voiceless fricative involves opening the glottis and 
letting air through. This bears soae resaablance to non-speech activities 
such as. blowing and respiration. For the latter, it is reasonable to assume 
that there ex}st resplratory-iaryngeal linkages whereby glottal abduction and 
adduction are automatically coordinated with respiratory activity. Speech 
production in both normals and the deaf most likely utilizes such linkages, 
although the details arp unknown at present. 

' ' ' \ 
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\ F00TH0TE 

Vor convenience in the following discussion, we will call the speech 
characteristics of the group "deaf speech" aid the speakers of>*<£af speech" 
will be called deaf." By making this identification, we acknowledge~"tbat not 
all persons who sustain severe to profound hearing losses produce this 
characteristic speech. 
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OH riHDIHG THAT SPEECH IS SPECIAL* 



Alvia N. Libeman* 



Abstract. A largely unsuccessful attempt to oonnunloate phonologic 
segments by sounds other than apaaoh led ay oolleaguea and a* to aak 
why ap aa oh does it ao nail* The answer cane tha ax>ra slowly baoauaa - 
we vara wedded to a "horltontal* view of language, aaaiot it aa a 
blologioelly arbitrary aaaenb^age of prooaaaaa that ara not than- 
selves linguistic. Aooordiagly, Ma eapeoted to find tha answer in 
faaaral processes of auditory paroaptioo to Mhloh tha aoouatio 
aifoal had baan made to* oonfora by vapor opriatt rafulatioa of tha 
aoveaeuts of articulation. What ' aa found waa tha oppoaita: 
specialised prooaaaaa of phonatio paroaption that had baon nada to 
oonfora to tha aoouatio ottttequeaoea of tha nay ertiouletory aova- 
nanta ara regulated, Tha diatinotivaly linguistic function of thaaa 
apaoialiiationa ia to provide for affioiant paroaption of phonatio 
structures that oaa alao ba efficiently produead. To assume that a 
phonatio apaoialisation axlata aooorda wall with a "aartleal" viaw 
of language in whioh tha underlying activities ara aaan aa ooharant 
and distinctive. Baoant evidence for suoh apaolal prooaaaaa oonaa 
f row experiuents daaignad to investigate tha intagration of cues. 



I welcome ihis opportunity to talk to ny fallow psychologists" «oout a 
aubjaot that baa, I think, baan too *uoh takan for grant ad. Tha subject ia 
-paroaption of phonatio aagnanta, the oonaonanta and vowala that lia nnar tha 
aurfaca of language . My ain is to pronote tha hypothesia that paroaption of 
thoaa aagnanta raata "on specialised prooaaaaa. These support a phonatio node 
of peroeptlofi, they aerva a diatinotivaly linguistic funotion, and they are 
part of the Targer specialisation for language. 



•In preea, Aneyjoan Tayohologiat . 

♦Also University of Connecticut and Tale University ~. 
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Tht phonttio specialization is apparently adapted to the singular code by 
trttioh phonetic etructure is oonneoted to sound, a cod* that ones its character 
to the way the sequent* of the structure are articulated and coarbiculated oy 
the organs of the vocal tract. Hpt surprisingly » then, phonetic processes 
incorporate a link between perception and production. With that as key, an 
otherwise opaque code beoomes perfectly transparent: diverse, continuous, and 
tangled sounds of* speech are automatically peroeive^as a scant handful of 
discrete end variously ordered segments. Moreover, the segments are given in 
perception as distinctively phonetic objects, without the encumbering auditory 
.baggage that would make them all but useless for their proper role as vehicles 
of language* # 

bit we do take speech and its acoustic nature for granted, so much so 
that it is, I suspect, hard to see why perception of phonetic segments should 
require prooesses of an other-than-auditory sort, and ev*tf harder, perhaps, to 
imagine what it might mean to perceive those segments as phonetic objects, 
free of a weighty burden of auditory particulars. It may help, then, to begin 
by recounting my experience with an attempt to transmit phonologic information 
by purely auditory means. That experience, exposed for me the problem that a 
phonetic specialization might solve, though it did not; of course, revfeal how 
\ the solution is achieved, nor did it show that the solution requires 
\ specialized processes. Evidence bearing on those matters is reserved for 
later sections. 

\ Perceiving Phonologic Segments ijn the Auditory rtode : An Assumption That 



\ 



In the mid-Forties I began, together with colleagues at Haskins Laborato- 
ries, to design a reading machine for the blind (Cooper, 1950; Nye, 196?; 
Studdert-Kennedy & Cooper, 1966). This was, or was to have been, <a device 
that would scan print and use its contours to control an acoustic signal. At 
V the* outset we assumed that our machine had only to produce, for each letter* a 
pattern of sound that was distinctively different from the patterns for other 
letters. Blind users would presumably learn to associate the sounds with the 
letters and thus come, in time, to read. The rationale, largely unspoken, was 
an assumption about the nature of speech — to Wit, that the sounds of speech 
represent the phonemes (roughly, the letters. of the alphabet) in a straight- 
forward way, one segment of sound for each phoneme. Accordingly, the 
perception of speech was thought to be no different fro* the perception, of 
other sounds, except as there was, in speech, a learned association between 
perceived sound and the name of the corresponding phoneme. Why not expect, 
then, that arbitrary but distinctive sounds would serve as well as speech, 
provided only that the users had sufficient training? 

Given that' expectation, we were ill prepared for the disappointing 
performance of the nonspeech signals our early machines produced. So we 
persisted, seeking to increase the perceptual distinctiveness of the sound 
alphabet and also the ease with which its ijnits would form into words and 
sentenoes. But our best efforts were unavailing. No matter how we patterned 
them, the sounds evoked a clutter of auditory detail that subjects oculd not 
readily organize and identify. This discouraged the subjects, but not me, for 
I had faith that the difficulty would ultimately yield to practice and the 
principles of learning. What loomed as a far more serious failing was that 
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modest increases in rate caused the unit sounds to dissolve into en imperapi- 
ououa bust. Indeed, this happened at rates barely one tenth those at which 
the discrete unfts of phonetio structure can be conveyed by speech. 

Having cose, thus, to the conclusion that we should try to learn frco 
speech, we began to study it. But our hope at that early stage was only that 
we wight find principles of auditory perception, hitherto unnot'oed, that the 
language system had somehow Managed to exploit. 1 These woula not only be 
interesting in their om right, but also useful in enabling us to overcome the 
practioal difficulty we had been having, since the auditory principles we 
hoped to find could presumably be applied to the design of nonspeech sounds 
our reading Mchine might be made to produce. 

ttoat I did not for a long time understand was that our practical 
difficulty lay, not in our having failed to find the right principles of 
auditory peiroeption, but, much deeper, in our having failed to see that the 
principles we sought vert simply not auditory. Perhaps I should have arrived 
at that understanding earlier had I not been in the grip of a misleading 
assumption that had .decisively shaped my thinking about speech, language, and, 
indeed, almost anything else I might have found psychologically interesting; 
I was the more misled because the assumption reflected what I took to be the 
received view; in any oase, I had never thought to question it. 

In casting about for a word to character lie the view I speak of, I hit on 
^horizontal" as being particularly appropriate, only to \ discover that 
J. A.-Fodor (Note D'had chosen the same word to describe what I take to be 
much the same view. Apparently, we have here a metapHbr whose time has come. 
As applied to language, the metaphor is intended to convey that the underlying 
processes are arranged in layers, none of them specific to language. On that 
horizontal orientation, language is accounted for by referenoe to whatever 
combination of, processes it happens to engage. Hence our assumption, in the 
attempt to find a substitute for speech, that perception of phonologic 
segments is normally accomplished, presumably in the first layer, by p ooesses 
of a generally auditory sort — that is, by processes no different from those 
that bring us the rustle of leaves in the Kind ov the rattle of a snake in the 
grass. To the extent we were concerned with the rest of language, we must 
have* v siipposed , in like manner, that syntactio structures are managed by using 
the most genera! resource* of cognition or Intelligence. There were surely 
other processes on our minds when we thought abjut language— attention, 
memory, learning, for exafeple — the exaot number and variety depending on Just 
which aspects of language aotivity our attention was directed to at the 
moment. But all the processes we might have iivoked had in common that none 
was specialized for language. We were not prepared to give language a biology 
of its own, but only to treat it as an ,epi phenomenon, a biologically arbitrary 
assemblage of processes that were not themselves linguistic. 

The opposite view— the one to which . I now incline— is, by oontrast, 
vertical* Seen this way, language does have its own biology. It is a 
coherent system , like echolooation\ in tty bat , comprising distinctive 
processes adapted to a distinctive function. The distinctive prooesses are 
those that underlie the grammatical oodes of syntax and phonology; their 
diatlnotive function is to overcome the limitations of communicating by 
agrammatic means. To appreciate those limitations, we need only oonsider how 



little we could My If, as in an agrammatio system, there were a straightfor- 
ward relation between message and signal, one signal, however elaborately 
patterned, for each message In such a system, the number of messages t0 
communicated could be no greater than the number of holiatically a'*d distinc- 
tively different signals that can be efficiently produced and perceived; and 
surely that number is very Mall, especially when the signal is acoustic. 
What the processes of syntax and phonology do for us, then, is t^ enoode an 
unlimited' number of messages into a very limited number of signals. In so 
doing, they match our , message-gene^ atlng capabilities to the restricted 
resources of cur signal-producing vocsl trsots snd our signal-perceiving ears. 
As for the phonetic pert of the phonologic domain, which is the subject of 
this paper, I will suggest that it, too, partakes of the distinctive function 
of grammatical codes, snd that it is, accordingly, also special. (For further 
discussion, see Mattlngly 1 Libcrman, 1969; Llberman A Studdert-Kennedy, 1978; 
Llbcrman, 1970.) 

The Special Funotlon of the Phonetic Mode 

To produce a large, indeed an infinite, number of messages with a small 
number of signals, a syntax would, in prinoiple, suffice. Without a phonolo- 
gy, however, each smallest unit of sn utterance would necesssrily be s word, 
so a talker w£uld have to make do with a very small vocabulary. The obvious 
function of the phonologic domain is, then, to construot words out of s few 
meaningless units, and thus to make possible the large vocabularies that human 
beings like to deploy. But the words of the vocabulary are presumably to be 
found in the deeper reaches of % the phonology, where they sre represented by 
the abstract phonemes that stand beneath the many phonetic variations st the 
surface, variations associated with phonetic context, word boundaries, rate of 
articulation, lexical stress, phrasal stress, idiolect, and 1i elect, to name 
the most obvious sources. What remains in speaking is, of course, to derive 
the surface phonetic structures, snd then to transmit them by using the organs 
of articulation to produce and modify sounds. Transmitting those structures 
as sounds and at high rates becomes the distinctive function of the phonetic 
mode. 

At average rates of speaking, talkers produce and listeners perceive 
about .8 to 10 segments per second. In the extreme, the rate may go to 25 or 
30 per second, at lea*»t for short stretches. Plainly, such rates would be 
impossible if each segment were represented, as* in the acoustic alphabets of 
our early reading machines, by a segment of sjuShi^ The organs of the vooal 



traot oannot make unit gestures that fast, anc 
of delivery of the resulting units of sound 
resolving power: of the ear. The trick, then, 
the rate at whi6h discrete segments of sound call be transmitted and perceived, 
while yet preserving the discrete phonetic segments those sounds must convey. 



even if they could, the rate 
would overreach the temporal 
is to evade the limitations on 



The vooal tract solves its part of the problem by breajcint "the two or 
thre^ dozen phonetic segments into a smaller number of features, assigning 
each feature to a. gesture that can be made more vi ''ess independently, and 
then turning the articulators loose, as it were, to do what they can. A 
consequence is that gestures corresponding to features of successive segments 
are produced at the same tlo?, or else grestly oyer lapped, aooording to the 
constraints and possibilities inherent in the messes to be moved and in the 
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neuromuscular arrangeaente that mvi torn. This is to say that tha charaote;- 
of «peeoh it determined large# by tha oatura of tha aeolianlaaa that do the 
speaking, kit ,tt ooald hardly .ha otherwlee. for even if Mature hid davisad 
articulators that oould aake auoocaaive wit gaaturaa at rapid rataa— putting 
asida that this would presuashly have destroyed tha utility of tha vocal tract 
for such othar purpoaaa as eating end breathing—the resulting druiflra of 
sound would, as 1 noted aarllar [ , dafaat tha aar. At all avanta, tha nature of , 
tha artioulatory prooaas produces a ralatlon batwaan phonetic segacat and 
sound—the singular coda I referred to in tha introduotior—thet Kit, I 
think, *«ke first place in any attaapt to investigate and undoratand tha 
parcaption of apaaoh* 

Oae oharaotorlatio of tha ooda that should immediately engage our 
attention follow from the /foot that one or another of the articulators is 
alaost always moving. The oonaequenoe la that many, perhaps Boat, of the 
potential aoouatio ouea — that ia, sspeots of the sound that beer a ayetemeVic 
relation to the phonetic segaant— are of a dynamic sort. * Witness, for 
eaeaple, the obangea in foment frequency, caused by the aoveaent froa one 
artioulatory position to another and known to be important ouea for various 
consonants (and, indeed, for vowela) (Liberaan. Dalattre, Cooper, i Gerataan, 
1351$. O'Connor, Gerataan, Liberaan, Dalattre, 4 Cooper; 1957* Mann g r« pp , 
I960} Strange, Jankloa, * Edman,' 1977). how do these tiae-varying aoouatio 
cues evoke diecrete and unitary phonetic peroepta that have no corresponding 
tiae-varying quality? 

' « 

Another characteristic of the oode, owing again to the way the articula- 
tors produce. and madulate the sound, ia that the aoouatio ouea are nuaeroua 
and diverse. In -the contrast between .the Cb] of rabid and the [p3 of rapid , 
for example, Ueker U978) has so far Identified sixteen' cues, representing a 
variety of aoouatio typee. Tha aany ouea ire not ordlnerlly of equal po war- 
Mae Mill override others— but power does not appear to be determined \ 
primarily by aoouatio proalnenoe. How, the&» is such 1 1 nuaeroua variety of N 
aeaaingly arbitrary ones bound into'e single phonetic percept? 



Finally, tha processes of articulation, and acre particularly coerticule- 
tlon, cause the potential cues for s phonetic segment to be widely distributed 
through the signal and merged, often quite thoroughly, with potential cues for 
other aegaents. In a syllable like baj, to take, a simple esse, it is likeTy 
that s single parameter of the sooustitf signal— ssy the second foraant — 
carries information simultaneously about at leest two of the constituent 
segments and, in aoae plaoes, all three (Cooper, Dalattre, Liberaen, Borst, 4 
Gerntaan, 1952; libermsn, 1974). Indeed, it is this characteristic of speech, 
this encoding of several phonetiro sagaents into one segaent of sound, that is, 
aa wa have seen, an essential aspect of the processes by which phonetic 
aegaents are produced and perceived at high rates. But the result ia an 
aoouatio aaalgaa, not an elphebet. How doea the listener recover from it the 
atring of diacrete phonetic aegaents it encodes? 

Of course , we aight try to evede those questions, and the thorny problems 
they pose for the auditory mode, by supposing that tha articulator produce, 
for each phonetic segaent, st leest one oue thst represents tha sogaent quite 
strslghtforwsrdly (Stevens A Bluastein, 1981). Beceuse the relation of that 
oue to the phonetic segaent is transparent to ordinary auditory prooesses, the 
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listener might respond aoat attentively Just to it, dismissing the others as 
so muoh c>*ff , or else learning to accept them as associated with, but wholly 
incidental to, the real business of talker end listener. Such evasion will be 
h^rd to maintain, however, If, as we now have reason to think, the typical - 
listener is sensitive to all the phonetic information in speech sounds (Bailey 
A Summerfield, 1980) .2 Certainly every potential cue so far tested has proved 
to be ah actual cue, no setter how peculiar seeding its relation to the 
phonetic segment. r 

He should Suppose/ then, that there is in speech perception a process by 
whloh the manifold of variously verged, continuous, and tine-varying cues it 
Mde to form in the listener 9 s Bind the discrete end ordered phonetic segments 
that were produced by the speaker. But it seems hardly conceivable that this 
could be accomplished by processes of a generally auditory sort. Therefore, I 
assume, as I said In the introduction, that the process is ? special one^-a 
distinctively phonetic process, specifically adapted to the unique charac- 
teristics of the speech code* Since that ccyie is opaque except as one 
understands the special way it comes about, I find it plausible to suppose 
further, that > a link between perception and production constrains the process .. 
as if by knowledge of what a vocal tract does when It make^ linguistically 
significant gestures (Cooper et al., 1952; Liberman, Dela-tre, & Cooper. 
1952). V 

A Special Process of the Phonetic Mode : Integration of Cues 

Of the many experimental results that bear on the existent and nature of 
distinctively phonetic processes, none 14 critical; what tells is the weight 
of the evidence and the way it converges on certain, conclusions. Faced, thus, 
with many more results than I could hope to include, I had to choose between 
picking s closely related few and, alternatively, offering a token of each 
type. (For recent and comprehensive reviews, see Repp 4 , #981 ; Studdert- 
Kennedy? 1980), I have chosen the related few, selecting them from recent 
studies that bear on the three questions raised by the characteristics of the 
speech code I referred to in the previous section. Aspects of these questions 
have long beeh wornled about as the problem of "segmeptatic-" : how is the 
acoustic signal "divided" into phonetic segments (Cooper et al., 1952; Fant, 
1962; Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967)? Recently, 
Repp (1978) and Oden and Maasaro (1978) have looked at the other side of the 
coin, putting attention on the problem of\"i«tegration": hoV»do cues combine 
to produce th* percept? It suits my purposes to adopt their perspective, and 
so, I will. 

i 

Integration of a time- varying sound . Frequency sweeps— called formant 
transitions — of t?e kind shown in Figur£ 1 can be sufficient cues for, the 
perceived distinction between the stop consonants [d] and [g] in the syllables 
[da] and [ga] (Harris, Hoffman, Liberman, & Delattre, 1958\. But, as I asked 
earlier, how are such frequency sweeps Integrated (as information about the 
» phonetic dimension of "place") into a unitary percept . [d] or [g], that has 
about it no hint of a corresponding sweep in pitch? Two interpretations are 
possible: one, that the integration Is accomplished by ordinary auditory 
processes; the other, that special phonetic processes come into play. 
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ta an auditory interpretation, one^fght suppose, most simply, that thla 
la an instance of low-leVel sensory Integration, eomething ilka tha well-known 
integ ration of intensity and time into \ tha paroaptlon of loudness. That 
possibility la quickly rultd out, hbwcve^ by tha obaarvatlon that whan the 
transition cues era removed from tha patterti and presented alone, as in the 
part of the figure at lower right, listener* do perceive a rising or falling 
•ohirp,* almost a gllaaando, that oonformat reaaonably to the time- varying 
peroept that paychoaoouatlo considerations \*ight have led u* to expect 
(Nattingly, Libcrman, Syfdal, * Halwaa, 1971). 



But the auditory theory is. not ao eaail^ disposed of, because it can 
always fall back on tha aaewptlon that the fo&tnt transitions collaborate 
with tha reet of the pattern in an interaction of\e purely eydltory sort, froa 
whloh the percepts, [d] or, tg),* emerge. It maters little that there is 
nothing in what we know about perception of ooat»x sounds to suggest that 
auoh interaction should occur, for we know very %ttle about perception of 
complex sounds. Nor does It necessarily matter »w iaplauslble it is to 
suppose that the articulators could ao oca port wemselves as to produce 
exactly the right combination of sounds, not just in «is instance, but in the 
myriad others that wist occur as the articulators commodate ^ to variations 
In; for- example, phonetic context, rate, and linguistic stress. Such consi- 
derations make an explanation based on auditory inter Atlon endlessly ad hoc , 
but they do not, in principle, rule it out. 

A phonetic interpretation, on the other hand, ">&d have it that the 
integration of the foment transitions into a witary percept reflects the 
operation of a device specialized to perceive the sounds In a linguistically 
appropriate way. As for what is linguistically appropriate, it is plain that 
peroelvlng the transitions as rising or falling chirps is not. Language, 
after all, has no use for that kind of auditory information; it only requires 
to know whether the segment was [d] or [g]. Indeed, if the chirps and other 
curious auditory characteristics of speech " sounds were heard aa such, they 
would intrude aa an intermediate stage of perception that had, ltaeir, to be 
v interpreted, however automatically* In that case, listening to speech would 
be like listening to the acoustic alphabets of our early reading machines, or 
to Norse code, and that would surely be awkward in the extreme. * 

What is required, if the time-varying transitions are to be perceived 
tappropr lately) as' unitary segments, Is that the percept reflect neither the 
proxlaaX sound nor the more distal movements it betokens, but rather th* still 
more distal, arid presumably more nearly unitary, neural command structure that 
occasioned the movements. A less timid writer might call that the talker's 
phonetic intent. *« 

<• But whatever the percept exactly corresponds to, I suppose that Kature 
provided a device that is well adaptad to its linguistic function, which la to 
aiake available to the listener just those phonetic objects he needs if he Is 
to understand what the speaker said. But Nature could not have anticipated 
the development of synthetic speech and dichotic stimulation, so it Is 
possible to defeat her design in such a way as to discover something about 
what the design is* To do this,, we se a method that derlvea from a discovery 
by Hand (1974). (See .also Isenberg A Liberman, 1978; Uberman, 1979). Its 
special feature Is a way of presenting patterns of synthetic speech so that an 
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acoustic out 18 perceived as a nonspeeoh sound and, simultaneously, as support 
for a phonetic percept * The obvious advantage of the method is that It holds 
the stimulus input constant while yet producing two percepts, thus providing a 
control for auditory interaction. Recently, the aethod has been applied by 
Mann, Madden, Russell, and Uberman (1961; Note 2) to determine how a time- 
varying formant transition* is integrated into the perception of a stop 

oonaonant. The experiment** was as follows . 

# * 

To om eir wt presented one or another of the nine fomwft transitions, 
aa shown at the loner right of Figure 1. By theaselves, these Isolated 
transitions soufld like tlae-varyjlng ,ohirpa — that ia, like reasonably faithful 
auditory reflections of the tine-varying aoouatlo signal. To the other ear, 
we presented all the rest of the pattern— the base, so called— that is shown 
at the lower left of the figure. By itself, the base is always perceived as a 
s',op-vowel syllable; seat listeners hear it as [da], soae as (ga]. 

When these two stimuli are presented dichotlcally, listeners report a 
duplet .percept. (k» one side of the duplsxity, the listeners perceive the 
syllable [da] or [gal, depending oh the identity of the Isolated transition. 
This speech percept is seemingly no different from the one that would have 
been produced had f*e base and the isolated transition been electronically 
mixed and presented in the normal manner. On the other aid*, i^d at the same, 
time, the listeners perceive a nonspeeoh chirp, not perceptibly different from 
what they experience when the transition is presented by itself. Thus, given 
exactly the same acoustic context, and the same brain, the transition *is 
simultaneously perceived in two phenomenally different ways: as crltf£& 
support for a stop consonant; in which cage it is Integrated into a unitary 
percept, and as a nonspeeoh chirp, in which case £t is not. 

To go beyond the phenomenology just described, we determined how the 
transitions would be discriminated, depending on which side of the duplex 
percept the listener was attending to* For that purpose, we sampled the 
continuum of . formant transitions by pairs, choosing, as members of each to-be- 
diacriminated pair, stimuli that were three steps apart on the continuum of 
formant transitions shown in Figure 1. these we presented in an AXB format (A 
and B being the two stimuli* to be discriminated and X* being the one or the 
other) to subjects who were instructed to decide ' on the basis of any 
perceptible difference whether X was more like A or like B. When the 
subject's attention was directed to the speech side of the duplex percept, we * 
obtained results represented in Figure 2 by the solid line; with attention 
directed to the nonspeeoh side, we obtained the results shown by the dashed 
line. The difference, is obvious. When the. transitions, support stop conso- 
nants—that is, when they are perceived in the phonetic mode — the discrimina- 
tion function has a rather high peak, the location of which corresponds 
closely to the phonetic boundary. This is the familiar tendency totja^d 
categorical perception that character! tea segments such aa these, a tendency 
that is, itself, rather highly adaptive, since it is only the categories! 
information—the segment is categorically [dj or Cg3— that is most relevant 
linguistically. When the same transitions are perceived, on the nonspeeoh 
side of the percept, as chirps, the discrimination function, shown aa the 
daahed line and open circles, ir different; in fact, it is nearly continuous* 3 
Thus, the discrimination functions confirm the more blatantly phenomenologicei 
results described earlier. Both indicate that integration of the formant 
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transition into • phonetic ;p«ro«pt i« owing to a special process that stakes 
available to perception s unitary phonetic object well suited to its role in 
language. "■■ 

* -Tbi donatio process that integrates the tr-nsitions has ot*er 

i oharscteristios, of course, including one that has attrscteo attention fo> a 
long Use: it adjusts perception tc variations in the acoustic signs 1 when 
those are oauaed by ooertioulstory accommodation to changes in phonetic 
contest; thus, it aaaaa to rest on s link between perception end production 
(Lineman at el., 1952; Nana, 1980; Nina 4 Repp, 1981). k second pert of the 
experiment juat described was designed to examine that perceptual adjustment 
to phonetic context, and to exploit the duplex peroept to Identify the domain, 
auditory or pho.etio, in which it occurs, lb that end, we took advsntage of 
an earlier experiment by Hann (1990) in which she had found that placing the 
.syllables [al] or tar] In front of the CdaMga] patterna caused the position 
of the CdaMga) boundary (on the oontlnuua of foment transitions) to shift— 
toward the [g] end for Car] and the [dj end for [all. Since the shift was 
consistent with tha change in CdaMga] articulation that can be shown to 
ooour -when the syllable Call or Car] is spoken immediately before, Mann 
inferred:: that this was, indeed, a ease in which the perceptual Bya:em had 
sutoaatically reflected coerticulatlon ind its accustic consequences. • 

Our further contribution to Mann's result was' simply to repest her 
experiment, but with the "duplex" procedure (and with measures of discrimina- 
tion substituted for tha. phonetio identifications she had used) . T»k outcome 
was quite atraigbtforward. On the rpeecb side of tha duplex peroept we (in 
effect) replicated the earlier result, as shown by tha results dlsplsyed in 
Figure 3. Taking tha discrimination data obtained with the isolated C daMga] 
syllables (solid line connecting solid clroles) as baseline, we see 'that 
piecing the syllable Car] in front caused the discrimination peak (and 
presumably tha phonetio boundary) to mov* to the right, toward the [g] and of 
the continuum of transijmn. when Call ^ needed, the peek (end the boundary) 
apparently shifted j^lBpoppoaite directioa—thst is, to the left, toward 
Cd]; for some subjects, ^tMeed, it shifted so f sr as to move off the stimulus 
continuum, so there Is, for them, no effective boundsry, which explains why 
Abe peak ia so low. For" present purposes, however, the point is simply that 
there are large effects of prior phonetio context on discrimination of the 
transitions w&tn those are perceived cn the speech side of the duplex percept. 
On the other band, ss wa sea in Figure «, % the nonspeech side of tne percept is 
unaf footed by phonetic context: diseriainstion of the foment transitions is 
tha same whether the base wss preoeded by Cai], by Car], or by nothing. 

Putting the two experiments together, we conclude that, givet. a single 
acoustic context, exactly the same foment transitions are perceived in two 
different modes. In the one mode, they evoke nonspeech chirps that have a 
time- varying quality oorreapondlng, approximately, to the time-varying stim- 
ulus; changes in the transitions sro parcel ved continuously; snd perception is 
unaffected by phonetic context. This is, of course, the auditor* mode. In 
the other mode, tha same transitions provide critical support for the 
perception of stop oonaoaaru that lick the tiae-vs ing quell ty of the 
nonspeech chirps; ohanges in Jie transitions sre perceived more or less 
categorically; and perception ia msrkedly sffected by phonetio context. This 
is the phonetic mode. 
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Figyrt 3* WaoriaiiiabtUty of tfea foraant transitions on tha spaach aida of 
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Figure H., Biacriainabillty of the fonaant transitions on the ncnspeech side 
of the duplex peroept under oondltions identioal to those repre- 
/ aanted in Figure 3 (fro* Mann. Madden, RuaseU, & Libeman, Note 
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base with and without silence * isolated transitions 

(to one ear) (to other ear) 



DUPLEX -PRODUCING (DICHOTIC) PRESENTATION 

Figure 5. Schematic representations of the stimulus patterns used to deter- 
mine whether the importance of silence as a cue is owing to 
auditory or phonetic factors. (From "Duplex /perception of oues for 
stop oonsonants: Evidence for a pbonetic mode," by A. N. Liberman, 
D. Isenberg, and B. Rakerd, Perception 4 Psyohophyslos . in press. 
Copyright by the Psyohoritamic Society, Inc. Reprinted by permis- . 
sion.) 
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Integration of sound and alienee . Perception of a phonetic segment 
typically depends, as I Indicated earlier, on the integration of several—many 
may be a more appropriate word — acoustic cues. Even in the case of [da] and 
[ga] just Ascribed, there was one other cue, silence preceding the transi- 
tions, though I did not remark it. To show the effect of - such silencer— an 
effect long known to researchers in speech (Bastian, Delattre, & Liberman, 
1959)~we must put the stop consonant and its transition cues into some other 
position, as in the' examples [spa] and [sta] shown at the top of 
Figure 5. As we see there, an important cue for perception of stop conso- 
nants — in tnis case, [p] and [t]--ia a short period of silence between the 
noise of th* fricative and the foment transitions that introduce the vocalic 
part of the syllable (Dorman, Raphael, & Liberman, 1979). 

But why is silence necessary, and in which domain, auditory or phonetic, 
is it integrated with the transition cues to produce stop consonants? On an 
auditory account, we might suppose that there is forward masking of the 
transition cues by the fricative noise, in which case the role of the 
intervening silence is to provide time for the transitions to evade masking. 
Failing that, we could, as always, invoke acme previously unnoticed interac- 
tion between frequency sweeps (transitions) and silence that is presumed to be 
characteristic of the way the auditory system works. 

% A phonetic interpretation, on the other hand, takes account of the fact * 
that presence or absence of silence supplies important ^phonetic information — 
to wit, that the talker closed his vocal tract, as he must to produce the [p] 
and [t] in [spa] and [sta], or that he did not, as he does not when he says 
[am]. Presumably, the processes of the phonetic mode are sensitive to the 
phonetic significance of the information that silence imparts. 

To decide between these interpretations, the phenomenon of duplex percep- 
tion was again exploited (Liberman, Isenberg, & Rakerd, in press). As shown 
in Figure 5. base stimuli that sometimes did, and sometimes did not, have 
ail nee were presented dichotically with transition cues appropriate for [p] 
or for [t]. Two such dichotically yoked patterns were presented on each 
trial; subjects were asked to identify the speech percepts and to discriminate 
the nonspe€k? w . chirps. Itoe result, was that the subjects fused the transitions 
with the base and accurately perceived [sa], [spa], or [sta], depending pn the 
presence or absence of silence in the base (to one ear) and the nature of the 
fonnant transitions (to the other). But the subjects also perceived the 
transitions as nonspeech chirps, and accurately discriminated them as same or 
different regardless of whether or noj there was silence in the base. Thus, 
duplex perception did occur, and silenee affected the identification of the 
speech, but lot the discrimination of the nonspeech. 

In a further experiment, the investigators provided a more severe test by 
asking sub Jects_to discriminate ^Jieir percepts on both sides of the duplexlty. 
For that p'urjesev two dichotically yoked pairs of stimuli were presented on 
each trial, so arranged as to exhaust all combinations of ailence-no silence 
in the base and [pMt] cues in the, isolated trar^tlons; — Subjects ^were 
asked, for each parr of percepts, to rate their confidence that a difference 
of any kind had been detected. - The results are shorn in Figure 6. There are 
but two critical comparisons; The first is in the leftmost third of the * 
figure, in the condition in which there was rro silence in cither of the two 
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base stimuli presented to the on* tar (labelled /Mo Silence - No Silence") and 
the two tranaltlon ouea to the other ear were different (Rebelled simply 
"Different"). Co the apeeoh aide of the dupleilty (open bar), we ate that the 
difference between, the tranaitiona waa not clearly detected, presumably 
beeauee, in the abaence of ailenoe in either base stiaulus, subjects perceived 
tea], in both eaaea. But, on the nonspeeoh side (shaded bar), tk.t same 
difference waa detected; here, the abeeno* of alienee in the base made no 
dMawence. The other orltioal oonparlson is seen in the bers immediately to 
the right,, in the middle third .of the elide, representing the condition thet 
had, in the one ear, ailenoe in one baa* stimulus but hot the other, and, in 
the ether, ear, two transition ouea that were the same. On the apeeoh side of 
the duplex percept, «e aee that the patterns were perceived as very* different, 
even though the tranaltlon ouea were the ease; presumably, this was because 
one percept, being Influenced by the preaene* of alienee. Included * atop 
oonaoaant, while the other, being influenced by the abeeno* of ailenoe, did 
not. the result on the nonspeeoh aid* stands in contrast .» There, 1 the 
percepts were Judged. to be not very different, accurately reflecting the foot 
tha£ they were, in feet, not different. 

Thus, in both orltioal comparisons, silence affected discrimination of 
the tranaitiona only on th* apeeoh aide of the dupl*x p*ro*pt. apparently, 
Ita importance depends on distinctively phonetic processes; arid its integra- 
tion with the transition occurs in- th* phonetic aod*. 

Th* integration of ailenoe and tranaitiona, f in th* patterns just 
described, reinforces the suggestion, made earlier in regard to the integra- 
tion of the tranaitiona alone, that the perceived object la not to be found in 
the movements of the apeeoh organs at the periphery, but rather et some still 
more dietal remove, aa suggested by Repp, LI barman, Eooardt, and Peaetaky 
(1978). To aee the point more dearly, we should firat take note of e finding 
thet adda another cue for the [p] in tepah the shaping of the fricative 
noise that la eauaed by the wgy the vocal treot cloaca for, [p] (Suamerfleld, 
Bailey, 3* ton, f Dorman, 1981). low we have three aeouatlo cues that 
correspond neatly to three corresponding aspects of the articulation. There 
la, first, the abape of the fricative nolae, which signals the dosing of the 
treat; then tte alienee, which elgnala the oloaure ltaelf; and finally the 
foment tranaitiona, which signal the subsequent opening into the vowel. If 
these three eobuatlc cues ere integrated into e percept thet does- not display 
at least three constituent elements, then th* pared ved object must be 
upstream from the peripheral articulation. A likely oendidete, as suggested 
earlier, la the unitary command struoture from whloh the various movements at 
the periphery unfolded. s 

Integration of periodic sound and noise . When a telker doses his vocal 

tract to produce a stop consonant and then opens it* Into a following vowel, 

the resulting ailenoe and formant transitions are, as w« have seen, Integrated 

into s stop consonant. It is surely provocative thet slmllsr fcrmant 

tranaitiona are produced, but without the ailenoe, when e telker almost closes 

his vocal treot so as to make th* noise of e fricative (e.g., (s)), and then 

opens into the vowel, for In suoh osses the formant tranaitiona do not support 

stops; they sre. Instead, integrated with the noise into the perception of a 

frioatlve (Harrla, 1956; Mann * Repp, 1980; Whaleu, 1980. Such Integration 

la shown in Figure 7, where I heve reproduced the results of s recent 
* • / 
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experiment by Repp (in press). What. we see in the figure are the judgments 
Vjl or [s] made to stimuli that were constructed as follows % The experimental 
variable, ranged on the abscissa, was the position on t^e frequency scale of a 
patch of band-limited noise as it moved between a place appropriate for [J ] 
and one-appropriate for [s]. The parameters were the nature of the (follow- 
ing) formant transitions — appropriate, in the on*. case,, for [s] and, in the 
other, for [J3~apd the two vowels [a] and [u]. We see that the transitions 
(and also the vowels) affected the perception of the fricative. 

Though not 'shown in this particular experiment, I would note, parentheti- 
. cally, that, patterns like these, but with 50 msec of silence inserted between 
the fricative noise and the vocalic section, will be perceived, not as ' 
fricative- vowel syllables, but as fricative-stop-vowel syllables (Mann & Repp, 
19B0). That i?, inserting 50 msec of silence will cause the formant 
.transitions to be integrated, net into fricatives, but into stops. It is 
difficult to account for that as, an auditory effect, but easy to see how it 
might reflect a special sAsitivity to information about a difference in 
articulation that changes the phonetic ♦•affiliation" of the acotistic transi- 
tions • 

% 

In a further, and more severe, test of the integration of~ transitions and 
fricative noise that we saw in Figure 7, Repp measured the effect of the 
< formant transitions , on the way listeners discriminated variations in the 
frequency positior) of the noise patch, using *or this purpose the highly 
sensitive method cff "fixed standard . n He found two distinctly different types 
of discrimination functions. One clearly showed an effect of the formant 
transitions' and reflected nearly 'categorical perception; the other just as 
clearly showed no effect of the formant transitions and represented perception 
thfct was nearly continuous. Which type Repp obtained in each particular case 
depended, apparently, on the listener's ability to isolate or "stream" the 
noise— that is, to create an effect similar, perhaps, to the ifae obtained by 
Cole and Scott (1973) when they found with fricative-vowel syllables that, as 
a result of repeated presentation, the noise and vocalic sections would form 
separate "stream*" that had little apparent relation to each other. At all 
events, we have here another instancS, though occurring in r different 
phonetic class and obtained by very different methods, of a single acoustic 
pattern that is perceived in two distinctly different ways. One reflects the 
, • integration of cues in the phonetic mode, the other the "nonintegration" of 
the same acoustic elements in ttye auditory mode. 

There id still another method that exploits the possibility of perceiving 
exactly the same stimulus pattern in two ways, and thus enables us to test yet 
again whether the i ntegra tion of formant transitions and^ noi ? e _ occurs in^the_ 
phonetic or~ auditory modes. But, now, the Two ways of perceiving are not 
speech versus nonspeech, as in the experiments described thus far, but rather 
two kinds of speech— namely, fricatives and stops. The relevant experiment is 
a recent one by Garden, Levitt, Jusczyk, and Walley (J981). Starting with 
synthetic patterns that produced stop-vowel syllables, they varied the second- 
formant transitions and found the boundary between [b] and [d]. Then they 
placed in front of these patterns a fixed patch of band-limited noise 9 
neutralized as between the fricatives [f] and [0]. In these patterns, the 
formant transitions cue the difference between the fricatives, but, because 
the place of vocal-tract constriction is different for the two fricatives, on 
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the om hand, and the two stops, on the other, the perceptual boundary on the 
continuum of foment transitions Is now dlsplaoed. 'That is, exactly the seae 
foment transitions distinguish the frioatives differently froa the way they 
distinguish the stops. The effect seems aost plausibly to be phonetic, 
mfleoting the listener's "knowledge," as it were, of the difference in 
articulatory plaoe of production between the stops, [b] and Cd], on the one 
hand, and the fricatives, E f } and [ ], on the other. But, just to ma*e sure, 
Cardan jnd his collaborators presented the patterns with the noise, patoh to 
one group/ of sub jects • and* boldly asked, thee to perceive stops; then, in 
precisely reverse fashion, they presented the patterns without the noise pstoh 
to-a second jjroup with instructions to perceive fricatives. The listeners' 
judgments refleoteo * bounder les on the oontimiua of transitions that were 
appropriate to the class of phonetic sm&aents ([b] vs. Cd] or [f] vs. [03) 
they were asked to hear. Thus, exactly the same eooustlo patterns yielded 
different boundaries on the oontlnuua of transitions, depending on whether the 
listeners were perceiving the patterns as stops or as fricatives. 
Discrimination functions warn also obtained, and these oonflmed the boundary 
shift. Ha see, 'then, that transition oues like > those that integrate with 
alienee to produce a stop consonant will Integrate with noise to produce* s 
fricative. In both oases, the integration la in the phonetic node. 

The equivalence of sound and alienee when ' ^ egrated* . Iaplioit in the 
discussion so far Is the assumption that when ac tic cues mtegrete to fom 
a phonetic percept, they ere, for that purpose, perceptually equivalent; ' 
otherwise, it would sake no sense to speak of the peroept es unitary. It is 
net iaplled that the buea are necessarily of equal iaportanoe or power, only 
that their separate contributions are not sensed as separate. But even that 
implication la' of interest froa a theoretical point of view, because the cues 
are often very different acoustically, having in common only that they are the ' 
ooasjon products of the seae linguistically significant gesture. Hence their 
equivalence is to be attributed, aost reasonably, to the link between 
perception and production that presumably characterizes phonetic processes. 

But the iaplled equivalence of diverse oues is so far just that — iaplled/ 
To test the equivalence wore dlreotly was the purpose of several experiments. 
Cne of these, by Fitch, Hal was, Eriokson, and U barman (1980), was designed to 
exaalne the equivalence of silence and foment transitions in perception of 
^the stop consonant in split aa opposed to its absence in slit . Synthetic 
* patterns like those \abown_ in Figure 8 were used. Tne variable was the 
- duration of si least? between the fricative noise. and the vocalio portion of the 
syllable; the parlaeter of the experiment was the nature of the fomant 
transi tions at the start of the vocalic , aeotlona t sat m aa_ to Maw that 
section towerf [lit], in thev one .cose, and toward [pi it] in the other. When 
stimuli that had been constructed in this way were presented for identifica- 
tion as s l i t or split , the results shown in Figure 9 were obtained. One sees 
there a trading relation not different in principle from those found by other 
investigators with other oues. (For a review, see, again, Bepp, 1981). The 
displacement of the two response f motions indicates that, for. the purpose of 
producing the [p] in split, about twenty msec of silenbe is equal to 
appropriate fomant transit ions. 4 Thus, silence is equivalent to sound, but 
only, I should think, when both are produced as parts of the same phonetic 
act. 
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Figurt 9. If foot of silont intsrvsl on psrooptlon of /slit/ vs. /split/ for 
tb« two sotting! of tt» transition ous. (Fros "Poresptuol tquivo- 
lsnos of two aooustlo* ouos for stop~oonsonsnt fcsnntr," by 
H. L. ntob, - T. Hslwss, D. N. Erlokson, snd a. N. Llboraon,. 
. frosotion k Psyohosterstos . 1980, 3«3-35<fe Copyright 1980 by 
tho rSyobonowlo Sooitty, too. Roprintod by porwissien.) 
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Or ooutm, it sight be argued that the splits produced by the two 
different combinations of silence and sound were not really equivalent, but 
the foroed-ohoioe identification procedure, permitting only the responses slit 
or apllt . gave the aubjecta no opportunity to aay so. Against that possibili- 
ty, tie carried out another experiment, designed to determine how well the 
•ubjeota oould discriminate selected combinations of the stimuli on any basis 
whatsoever. The rationale for aclefttton of stimuli was as follows. If the 
two cues, alienee and sound, are truly equivalent in phonetic perception, 
their perceptual ef facta should be algebraically additive, as it were. Thus, 
given two synthetic syllablea to be discriminated, and given a base-line level 
of dlaorimlnabllity determined for pairs of stimuli that differ in only one of 
the cuea, it* should be possible to add the second cue so as to increase or 
deoreaae diacrlmlnablllty, according aa the phonetic "polarity" of the two 
cues causes their effects to work together or - at cross purposes. The cues 
should "suamate," or "cooperate," when they are biased in the same phonetic 
direction— as when one of the syllablea to be discriminated combines a siienoe 
oue that ia longer by the meount of the "trade" with transition cues of the 
(plit] type, and the other syllable combines a silence cue that la shorter by 
the amount of the "trade" with transition cuea of the [lit] type. They should 
"oanoel" each other or "00611 ict," when the opposite pairing ia made— that ia, 
whan the longer silence- oue ia combined with tranaition cuea of the Hit; 
type, and the shorter siienoe que with tranaition ouea of the (plit] type. 
Pairs of stimuli meeting those speolfioationa, and sampling the cohtinuua of 
siienoe durations, were presented for foroed-ohoioe discrimination. As 
shown in Figure 10, discrimination of patterns differing by both oues was, 
in faot, either better or worse than patterns that differed by only one, 
.depending on whether the ouea were calculated to "cooperate" or to 
"eonfliot." Apparently, the effecta of the two oues did converge on a single 
perceptual object. By this test, then, the oues may be said to be equivalent, 
and the percept may be said to be truly unitary. 

That the equivalence of alienee end sound in the above example ia wing 
to phonetic processes is supported in an experiment by Best. Morronfciello, and 
Robaon (1981). Indeed, it is supported there more strongly than in the 
experiment Just described, because Best and her collaborators found that the 
equivalence was Manifest only when the stimulus patterns were perceived as 
speech. As a first step, they performed an experiment very similar to the one 
by Fitch et ale, except that the atisuli were say-stay instead of silt-split* 
and the transition-cue parameter was simply the frequency at which the first 
foment started. With nh#ae stimuli, they obtained the identification func- 
tions shown ih Figure 11. He see there almost exactly the- same kind of 
trading relation between silence and foment transition that had been found in 
then earlier experiment. In tfie manner of Fitch et al;, they slab tested 
discrimination, finding, just as Fitch et al. had, that Che two cues could be 
made to cooperate or to conflict depending on their phonetic polarities. But 
now they performed an experiment that proved to be. particularly revealing** 
Borrowing a 'procedure that had been used successfully for a similar purpose 
CLsne t S^hpjtldtr^ Hote 3; Bmtlley^ SummerflaldU J* Soman r itfli ^Dormeib 4979) f 
and more recently made the object of further attention (Kerne*, HObln, Pisoni, 
& Carrell, 198t), they replaced the foments of the vocalic portion of the 
syllable with sine waves, taking care that (he sine waves followed exactly the 
course of the foments they replaced. The sounds that result are peroelved by 
most people., at leapt initially, as nonspeech patterns of noises and tones. 

\ * 
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faroant oorrtot dlaarlainatiafc for pair* of atiauli that difftr by 
mm out or by tuo ohm- of tha. mb« (oooparating ouaa) or oppoaita 
(oonfliotlnft ouaa) pboaatlo polar! tiaa. (Proa Varoaptual aqulva- 
lanca of tuo aoouatlo ouaa for atop^oooaooant Mannar," by 
i..L. FitoH* " T* Holuta, 0* M. trloMaoo, and a. M. Ubaraan, 
Paroaotioo i Payotoptealoa . 1930, 3*3-350. Copyrlfht i960 by 
tha hjofeoftoftlo Soolety, loo. HaprlStad by par«i talon.) 



136 




— *t—§ "fey" 

(F,0eee*-4*0»i*Mj 



40 56 72 M 104 120 136 
SILENT GAP DURtfKftnmaec) 



Figure if, Effect of aileni interval on perception or /say/ vs. /atay/ for the 
two settings of the transition cue. (Frca "Perceptual equivalence 
of aeouatit cues In apeeoh and nonspeeoh perception," by 
C. T. BttAjt, B. Mon»ongiollo, t 8. Sobsor, Perception 4 
Payohopfiyaie a, 1981, 29.. Copyright 1981 by the PayohonoaC 

ie -Society, Inc, Reprinted by permission.) 
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Fit **• 12. Effaot of ailant intanral on ndantifloation" of siRo-«av« analo- 
tm* of aar-stay stiauli. Graph A It for thoaa aifcjoota Mho 
paroaivad thaaa atinuli m apaaob Caay-stay* listanara) . Grapha B 
and C art for tfaoaa Mho paroaivad than as nocapaaoh, divldad, 
aooordlng to thair raporta of what tba aounda vara Uka, Into thoaa 
Mho vara apparently attandlttg to too tranaltioo oua (Graph B, 
"apaotral" lUtaaora) or, altarnativaly* tha allanoa oua (Graph C, 
"tasporal" Uataoara) . (Fran *aroaptual aqulvalanoa of aoouatlo 
ouaa in apoaoh and nona pa aoh paroaption,* by C. T. Bast, 
5* MorrontiaUo f and t. Hobaoo. raroaption 4 Myghophyaloa , 1981, 
££. 191-211. Copyright 1981 by tha r PayohWio Voiaty, too. 
lapt'intad by paralaalon.) 
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Ait some spontaneously perceive thai as speech, and others peroeive them so 
after it has been suggested to then that they sight. It is possible* thus, to 
obtain identification and discrimination funotiona for the sne stimuli when, 
in the one case, they are perceived as speeoh and when, in the other, they are 
not. (then peroeived as nonspeech the patterns are, of course, not readily 
identifiable, but identification funotions can be obtained by presenting, on 

each trial, the target stimulus — that is, the stimulus to be identified 

together Kith the two stimuli at the ex treses of the continuum, and then 
asking the subject to say whether the target stimulus is more liko one or the 
other of the extremes. To insure comparability, the same prooedure is used 
when the subjects are perceiving the stimuli as speech.) The results are 
ahown in Figure 12. He see, in Figure 12a, that when the subjects were 
perceiving the patterns as speeoh ("aay-etay" listeners), their identification 
funotions exhibited the now f Miliar trading relation. But when the same 
stimuli were perceived as nonspeech, then, as shown in Figures 12b and 12c, 
two quite different patterns emerged/ depending on whether, as inferred from 
the subjects 1 descriptions of the sound, they were attending to the transition 
cue ("apeotral" listeners) or the silence cue ("temporal" listeners). It is, 
of course, precisely because the subjects cotild not integrate the cues in the 
nonspeech percept that they chose, as it were, between the one cue and the 
other. In any case, both of the identification funotions in the nonspeeoh 
case are different from the one that characterizes the response to exaotly the 
same stimuli when they were perceived as speech. (Discrimination functions 
obtained with the same stimuli were also different depending on whether or not 
the stimuli were perceived as speech, nicely confirming the result obtained 
with the identification measure.) Thus, with yet another method for obtaining 
speech and nonspeech percepts from the 3«ae stimulus, we again find evidence 
supporting the existence of a phonetic mode, and we see that the equivalence 
of integrated cues is to be attributed to the distinctively phonetic processes 
it incorporates. 

The equivalence of sound and sight when integrated . Perhaps the most 
unusual evidence relevant to the issue 1 have been discussing comes from a 
startling discovery by NcGurk and MacDonald (.1976) about the influence on 
speech perception of optical information about the talker's articulation. 
(See also MacDonald & HcGurk, 1978; Summer field, 1979) * When subjects view a 
film of a talker saying one syllable, while a recorded voice says another, 
then, under certain conditions, they experience a unitary percept that 
overrides the conflicting optical and acoustic cues. Thus, for example, when 
the talker articulated [ga] or [da] and the voice said [ba], most subjects 
perceived [da]. In that case, the effect of the optical stimulus was, at the 
very least, to determine place of production. When, in a subsequent experi- 
ment by HcGurk and Buchanan (Note 4), the talker was seen to produce the 
syllables [ba], [va], [^a], [da], [3a], [ga], [ha], while the recorded voice 
said [btf] over and over again, most subjects perceived [ba], [va], [fra], [da], 
[da], and then, for visual [ha], a variety of percepts other than Here, 
both place of articulation and manner of articulation were detertfined by the 
optical input. -(The difficulty of seeing farther back in the vocal tract than 
[da] accounts, presumably, for the fact that visual [3a], [ga], and [ha] were 
perceived as having generally more forward places of production.) 

Having witnessed a demonstration of the McGurk-MacDonald effect, *I take 
the liberty of offering testimony of own. I found the effect ccapellin;, 



but 9 more to the point 9 I would agree that McGurk and Buohanan (Mote 4) have 
captured my experience when they say, "...the majority of listeners have no 
awareness of bimodal conflict and then describe the percept as "uni- 

fied." Surely, my percept was unified in the impo* tent sense that I could not 
have decided by introspective analysis that partes visual in origin and part 
auditory. Even in thoae oases in which, given conflicting optical and 
acoustic cues, I experienced two syllables, there was nothing about their 
quality that would have permitted me to know whic: ,1 had seen and which I had 
heard. j 

i 

By way of interpretation, NaoDonald and McGurk (1978) indioate that their 
results bespeak a connection between perception anu production, and McGurk and 
Buohanan (Note 4) echo • comment by Summer field (1979), Mho observed, after 
having himself performed several experiments on the phenomenon, that the 
optioal and acoustic signals are picked up In a "common metric of articulatory 
dynamics." I would agree, though I would, of course, ' prefer to call the 
common metric "phonetic, " But a mode by any other name would bear as 
weightily on the issue' I have put before you, for the Important consideration 
is that, in any ordinary aense of modality, the eech percept is neither 
visual nor auditory; it is, rather, something else. 

* • 

Integration /into order ed strings . Having so far considered only the 
perception of individual phonetic segments, we should put some attention on 
the fact that phonetic segments are normally perceived in ordered strings. 
This wants explioit treatment if only becausi, as the reader may recall, a 
characteristic of the spefcch code is that several phonetic segments are 
conveyed simultaneously by a single segment of sound. As the reader may also 
recall, it is Just this characteristic of the code that enables the listener 
to evade the limitation imposed by the temporal ^e&blvxng power of the ear. 
The further consequence for perception, which we will consider now, is tiiat 
the listener cannot peroeive phonetic segment by-nhortetic segment -in left to 
right (or right to left) fashion; rather, he must^take account of 'the* entire 
stretoh f sound over which th€ information is distributed. Such an acoustic 
stretoh typically sights a phonetic structure that comprises several seg- 
ments. I will offer only a brief example, taken from a recent study by Repp 
et ml. (1978), and chosen because the rw: evert, hpau happens to cross a word 
boundary. 

The experiment dealt with the effect of two cues, silence and noise 
duration, on perception of the locutions ^ ay sh ip, gray chip , great ghlp , and 
great chip , In Figure 13 is a spectrogr amof the words gra y* ship , with which 
the experiment began. The variable, shewn in tb* 'igure, was the duration of 
silence between the two words. Given the result of preview research, we 
knew that Increasing the silence would bias away from the fricative in ship 
Kid toward the affricate (stop-initiated fricative) lin chip (Dorman, Raphael, 
& Isenberg* 1980; Dorman et ml/! 1*79). The panjaeter, also shown in the 
figure, was the duration of the fricaiive rtoi&e, known from previous research 
to be a cue for the same distinction: increases in duration of the noise bias 
toward fricative and away from affricate (Geratman, 1957; Dorman et al. t 
1979). 

In Figure 14 are the results. We see in the graph at the upper left that 
when the noise duration was relatively short '62 msec), increasing the 
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11. Tha* af f aot of duration of al lance » «t aaeh of four durations of 
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affrioata) Banner, (Proa "Perceptual integration of aoouatlo cues 
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duration of the silence caused the percept to thange from ship to chip . Thus, 
the effect of silence was to produce a stop-like consonant to its "right ," 
tfUoh as it had done in the cases of sllt-aplit and taa]-[apa]-[sta] that were 
dealt with earlier, Bat, as shown in the graph at the lower right, when the 
duration of the fricative noise, was relatively long (182 msec), increases in 
the 'duration of the silence caused the perception to change, not to gray chip , 
as before* but to great ship . That is, increasing the duration of the 
fricative noise io ahip put a stop consonant at the end of the preceding word. 
The effect is superficially "right to left." But, of course, the effect is in 
neither direction; it is more properly regarded as a matter of apprehending a 
structure. 

* Given, then, that *the listener must recover several phonetic segments 
from the same span of sound, we ask three questions about the underlying 
process. First, how does the listener delimit the ^acoustic span? That is,~ 
bow does he know when all the information that is to be. provided has been 
provided? There is, after all, no acoustic signal that regularly marks the 
information boundary. Second, how does the listener store the information as 
it accumulates? And, third, what does he do while he waits? Does he simply 
resonate, as -it were, or does he entertain hypotheses? If the latter, doe? he 
entertain- all possible hypotheses? Does he weight them abcording tp the 
likelihood they are correct? And how quickly does he abandon them as they are 
proved wrong? 

If these questions seem familiar to students oT sentence perception* it 
is, I think, because processes in the phonetic and syntactic domains do have 
something in common. In both cases, information is distributed in distinc- 
tively linguistic ways through the signal. As a consequence, the perceiver 
must recover distinctively linguistic structures; To that extent, the resem- 
blance between processing in the two domains is not superficial. Nor is it, 
if we take the vertical view of language I earlier espoused, altogether 
surprising. 

Afterwords, Omissions, and Prospects 

Having set out years ago to study communication by acoustic alphabets, we 
might still be so occupied. For acoustic alphabets can be used for communica- 
tion—witness Morse code — and there are innumerable experiments we could have 
done had we gone on trying to find the alphabet that works best. But it Is 
not likely, as a practical matter, that we would ever have made a large 
improvement. Nor is it likely,. from a scientific point of view, that we would 
ever have learned anything interesting. Acoustic alphabets cannot become part 
of a coherent process; I suspect, therefore, that there is nothing interesting 

to be learned. 

* 

But speech was always before us, proof that there is a better way. 
Inevitably, then, we put our attention there and, in so doing, began to bark 
up the right tree. It remained only to find that speech and language require 
to be understood in their own terms, not by reference to diverse processes of 
a horizontal sort. But once the vertical yiew is adopted, there is little 
doubt about what we must try to understand. 
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There is also little doubt 9 at any stage of the research on speech , about 
how such or how little we do understand, because there is a standard by yhich 
progress can be measured; we are not in the position of explaining behavior 
that we have ourselves contrived. Thus, to test what we think we know of the 
relation between phonetic structure and sound, we have only to see how that 
knowledge fares when used as a basis for synthesis. In fact, it does well 
enough to enable us to synthesize reasonably intelligible speech, which 
suggests that we do know something (Liberman, Ingemann, Lisker, Delattre, & 
Cooper, 1959; Klatt, 1980; Mattingly, 1980). But the speech is not nearly so 
good as the real thijig^-whiclf proves , as if proof were needed, that we have 
something still to learn. Perhaps what we oust learn most generally is to 
accept the hypothesis, alluded to earlier in the paper, that hunan listeners 
are sensitive tQ all the phonetically relevant information in the speech 
signal. If that hypothesis is true, and if the acoustic cues that convey the 
information are as numerous, various, and intertwined as we now believe them 
to be, then we should act on our assumption th£t the key to the phonetic code^ 
is in the manner of its production. That requires taking account of all we 
can learn about the organization and control of articulator^ movements. It 
also required" trying, by direct experiment, to find the perceptual conse- 
quences (for the listener) of various articulatory maneuvers (by the speaker). 
To do that we must, of course, press forward with the development of a 
research synthesizer designed to operate from articulatory, rather than 
acoustic, controls (Mermelstein t 1973; Rubin, JJaer, & Mermelstein, 1981; 
Abramson, Nye, Henderson, & Marshall, T981). The perfection of such a device, 
itself an achievement of some scientific consequence, will enable us to find a 
more accurate, elegant, and useful characterization of the informational b&sis 
for speech perception. 

It will not have escaped notice that the claim to understanding I have 
made is, in any case, a modest one. At most, we presume to know something 
about what phonetic processes do, * and in what ways they are 'distinctive and 
coherent. As for mechanism, however, there is only the 'assumed link between 
pett$eption and production, and even there we have no certain, or even clear, 
idea how such a link might be effected. If we knew more about mechanism, we 
would presumably be in a better position to design automatic speech recogniz- 
ers of a nontrivial sort (Levinson & Liberman, 1981). At present, however, we 
can only claim to understand where the difficulties lie. That is an important 
step, to be sure, but it is only the first one, and it will almost surely 
prove to be the easiest. 

Since I have taken the position that speech perception depends on 
biologically specialized processes, I should, at last, acknowledge that 
neurological and developmental studies are relevant. For if phonetic 
processes are distinctive and coherent from a perceptual point of view, we 
reasonably expect that they are so from a neurological point of view as well. 
We do, then, look to neuropsychological data to provide further tests of our 
hypotheses , to refine our characterizations , and, indeed , to supply new 
insights into the processes themselves. As for the biology of the matter, we 
must "ely^heavily, of course, on developmental studies of speech perception, 
especially when these include very young infants and comparisons across 
languages. Such studies enlighten us about what might have developed by 
evolution in the history of the race, and what remains to develop, presumably 
by epigenesis, in the history of the individual. Of course, neither the 
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neuropsychological nor the developmental studies will be useful unless we ask 
the right questions. But I believe we are learning how to do precisely that. 
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FOOTNOTES 
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Lcel^S< 



'At one point we assumed that these principles were so genersl as to 
extend to perception in all modalities. Indeed, we carried out experiments 
designed to explore the possibility that patterns could be preserved across 
vision and audition provided the stimulus coordinates were properly trans- 
formed (Cooper, LI be man, & Borst, 1951). 

1 

'In contrast to the remarkable sensitivity of the phonetic node to all 
aspects of the acoustic signal that do convey phonetic information, there is 
its equally remarkable insensitivity to those aspects of the signal that do 
not. Thus, as is well known from many years of research on synthetic speech, 
the phonetic oomponent of the percept is usually unaffected by gross varia^ 
tions in those aspects of the signal — for example, bandwidth of the foraanta£- s 
that are beyond the control of the articulatory apparatus and hence necessari- 
ly' irrelevant for all linguistic purposes (Liberman & Cooper, 1 9f 2 ; Remez et 
al., 1981). The only effect of such variations is to make the speech sound 
unnatural or , in the most extreme cases , t*p make it impossible for the 
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lUt«Mr tooptrotlv* tha found m a'paaoh. . 

^whan tbt ohlrpa art diaorialnatad in laolatlon~-thtt ia, not aa part of 
tba duplax paroapt — tha funotlon haa tha aaaa ahapt, but tha laval la 
diaplaoad about 15 paroantaga polnta higbar. lha dlffaranoa in lavu ia 
praauaably owing to tba diatraotion produoad in tba duplax oondition by tba 
ottaar aida of tba pareapt. 

*Tha axlatanoa of thaaa trading ralationa aaana that tba location of a 
pbonatie boundary on an aoouatio oontinuua ia not fixad; within liaita it will 
■ova aa tba aattlngs of the aeveral euaa art obangad. Tha boundary will alao 
■ova, of oourea, aa a funotlon of pbonatie oontaxt. (Saa tba dlaouaeioa, 
above, of tha affaot of praoading oontaxt on tha CdaMga] boundary and alao, 
for axanpla, Mann and Repp, 1981; xtpp and Mann, 1981.) 
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vmsw. P&OSOOT, AMD omnoufHT 
Pahorah WllmfiMf 



' nmwpocnoi 

. — „- . _ — — -- — * , 

fboaatlo* raoodlng offsets la allaot raadlng bava 5nb raportod by 
nunbor of lavoatlgatora smploying a varlsty of tsporimoatal tsehalauos .\ 
ratio* by Conrad, 1972) and toning la asvarol laagoagss and ortbegropbla 
aystoma (Taoag, Jung, ft Mang, 19771 frlonaoa, battlugly, ft Yurvoy. -1977; baton 
ft aafarom, WO. Mill* too proaoaoa of • pboaotlo roproaontttlon In roading 
bot nana aoartaolagiy dtmooatratad, tbo souroa of tba off oat art tba rolo of 
tan roproaamtatloa ronaln largoly antiplorod. Tbt obtloua.txplsoation--tbat 
tbo offost rosoUs from a. proooso of g rapb a ao t o- pboaoat eonvtraloa*- la 
falatflad by ovtooaoo for pboaatla rt» -rdlng le roading noa-tlpbabatlo ortbo- 
jrapblM (tsoag ot al., 1977; aVlokaoa ot al., 1977). '* 

• m 

Cat atratagy tbat night prota fruitful la untangling thoao puntloa la to 
apoolfy abat Uagulatla proportion art oamodlsd la tba pboaatlo raprosatotloo 
oonttraotod by fluoat ' ro aoo r a. Tba proaaaoa of aagaootal pbonstlo foaturoa 
baa apt* firmly aatabllahod by tba atudlts olttd abort, bat tvldt««t for 
anpra iap mtatal foaturoa, aoob at not* .atroaa aad a a ato a of prosody, baa oot 
btrt t o fort baoa apught* though imaoara* aubjaptlvt roporta auggoat tbat thoso- 
foaturoa art also proton*. Utlasn <iro) dtaonstratod an important rolo for 
pboaotlo roooilng in tbt. ooapr obtusion of urittto aantonoot, aad alnoo 
mm atagaant al a batt beta aboan to play a rolo la tbt pteoaptlov of apabaa 
uttarauooa, tvloonot for a u p r at aa mo at alt la tba pboaatlo roprtaontatloa of 
urlttta lsngusga~«hloh ltsolf amrka only, tbt groaatat aupraaogmtntal propor- 
tloa of' aaata a oaa w o uld bt taotaUalBg artdaoot for a nodal of roading baaad 
pa a atroag dt p oodto o y of roading an apooob ptroop. oo. 

K * 

. In a amaU pilot aaptejaant ualng tba raapoaat blaa tooonlquo (Nabltr ft 
Carty, 1967), tba study dfportod^ bart aougbt ovldonot tbat aubjoott snoods 
word atroaa la allaat rtadlag on. tba ltvsl of tba slngls word. 



4ftlao ttalvsrslty of Connootlout. 

Aotooulsdssont . Tola uorfc uaa suoportod by aXCHb Training Grant HD 00321 to 
Ignatius 0. fettlngly and Donald anankuallar at tba Oblvaratty of Gonoootl~ 
oat* X am indobttd to Josspfa lupin for bis btlp with tba atatlatloal 
aa al yi l a of data- aad for tba ooaputor prograa uaad In tba otporimtnt; to Lyn 
ffraaltr for bar automations la tba tarly dovtlopnont of tbt aotbod abploytd; 
aad to Jaaat Fodor and Ignatius Nattlagly for tbair patitnt support and aany 
btlpful.dlaeuaaiona tbroug bout *&t projtot. 

[ElSniS UBOkftTOtllS; Status assort on aooaoh Rtstsreh 38-67/63 (1981)) 
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STIMULI 



ttst items in thia eyfArlmeot wr* ten norda chosen fro* among tkeae 
Rngll* diayllabic fcomogriphe uttose srntectic olasa depends or. the placement 
of primary atreaa. For ex apple* content la a noun when the fir at ay liable la 
atreaaed and <r Mjeotiv* tor reflexive verb "to content ooeaeir 1 ) when the 
aeoootf eyr.at , ia atreaaed . Similarly, permit ia a /iQun when ike flrat 
syllable ia atreaaed and a verb vhen the aeoond ayllable ,ls atreaaed* The 
orthography data not rep*eaent the location of word atreaa for these word a; 
preaumably in normal clreumatanes, eeatcrtial* oonteit provides the necessary 
information for choosing in theae few smblguoua oases. 

Teat stimuli ~wer *~ttst*r ooiipoaed of eight unmablguoualy atreaaed diaylla- 
, bio worda -and a ninth, final word taken from the aet of homographs. All of 
the unambiguous worda in « single Hat 4 were matched for placement of primary 
atreaa (i.e. , all had fir at ayllable atreas or ail had aeoond ayilatiie stress) 
but were of tjried syntactic and semantic classes. j 4 

Teat liata were embedded in a aeries of foil lists consisting of from 
eight to eleven worda chosen at ran dom. The ratio of foil seta to teat sets 
*aa 7: It yH-diag ftO Uata. 

In a pretest of the test stimuli , 20 subjects were asked to read al< *! a 
Hat f f 300 English worda, among which the teat worda were embedded. Their 
assignment of strv*a for the homographs was recorded, Responses to thla 
"«yy^eat were use* as a baaellne measure of preference in the experiment,, 
i Results appear in Table 1, Column 4^' Each teat homograph waa preceded in the 
main eiperiment by a liat that shared the atreaa pattern of ita lees-preferred 
reading. 



SUBJECTS 
i 

Subjects were 18 undergraduate volunteers enrolled in introductory lit*- 
guiatlca courses at the University of Connecticut. All were native speakers 
o£ English* They were paid for their participation * 

PROCEDURE 

Subjects were told that the /urpoae of the main eaperimont waa to measure 
7 the effect of reading rate on accuracy of recall. Each subject was tested 
separately. 7he subject was slated in front of a computer-controlled CRT 
screen on which appeared, * for imch trial, a vertical Hat of eight to eleven 
worda. The subject waa lnatrue';td to read each word on the list, silently from 
top, to' bottom, as quickly as possible without missing any of the wefrda, and to 
signal the experimenter when he oft she was finished by readir* the laat word 
on tbe Hat out loud , The Use on the screen then disappeared and was 
replaced, by a aingle word. The bject tars instructed to respond *yes* if the 
word waa on the preceding/ liat and *no* if it waa not. ' This prqhe word was 
never one of the tomogrjfrhs* .Subjects 1 spoken responses w re tape-recorded 
for transcription lateen The* entire presentation took approximately flf^eerv 
minu^a. / 
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RESULTS 



The results of this, experiment ere suaaarized in Tsble 1. Coluan A gives 
the percentage of times that the less-preferred stress pattern for each 



Table 1 
% Less Preferred Stress 

B 



ITEM 



PRETEST (N-20) 



BIAS CONDITION 



BIAS CONDITION 
(MEMORY QUESTION 
CORRECT) 



oonduot 

object 

pervert 

present 

digest 

progress 

permit 

ajb jeot 

incline 

project 



10} (initial) 

20 (final) 

40 (initial) 

30 (final) 

20 (initial) 

40 (final) 

20 (initial) 

30 (final) 

10 (initial) 

.30 (initial) 



72* (18) 

17 (18) 

77 (18) 

28 (18) 

39 (18) 

33 (18) 

33 (18) 

33 (18) 

0 (17) 

53 (17) 



82* 

13 
77 
29 
38 
29 
46 

33 
0 

56 



C1D 
-(15) 
(18) 
(14) 

(16) 
(14) 
(ID 
(18) 
(17) 
(17) 



am^lguoun item W8 given as a response in the pretest and note: whether \ the 
iesa-ptv/erred reading was as a noun (with first syllable stress) or as a verb 
(with second syllable stress) • CoUmn B gives the percentage of the tri*ls\in 
which the less-preferred stress pattern was elicited in the biasing condition. 
Tha ni*rt>er of subjects is given in parentheses in this colw*. Column C gives 
the percentage of trials in which the less-preferred pattern was elicited from 
subjects *io answered the word iecognitlon question correctly for that test 

list*, The number of subjects who answered co. rectly appears in parentheses. 

* ,\ 

Comparison of Colum s 1 and B indicates an effect of the biasing lists on 
tht» stress pattern pf the ambiguous test items. In a Wilcoxon one-tailed 
teat, this difference wa^ significant at the .05 level." 

V 

The biasing effect becomes even more apparent if we take into account 
subjects' performance on the recognition test. Colunn C gives the results 
just Tor subjects who answered the memory question co.rectly for the list in 
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question. Comparison of Columns A and C shows a significant difference at the 
•01 level. 

A further indication that the biasing manipulation was responsible for 
the effect observed is that a strong correlation (r=.8D was found between 
performance on the recognition task and number of shifted response*, account- 
ing for 66% of the variation between subjects . This correlation is graphed in 
Figure 1. The graph shows a wide range of subject performance. If we look at 
both ends of this range, at the two least successful and the two most 
successful subjects, we find that where performance on the memory task was 69- 
70 per cent, subjects gave the less-preferred reading only 20 per cent of the 
time, while the two subjects who answered 88 per cent of the recognition 
questions correctly gave the less-preferred reading 60 per cent of the time. 



DISCUSSION 

The correlation found is open to two interpretations. Uuder one in- 
terpretation, a subject's success in the recognition task is attributable to 
the amount of attention paid ,o,the .task. The more attentive subjects were 
more likely to have thoroughly read the word lists; thu.- they were more likely 
to have recoded the items on the list, and so to have been primed by 
properties of the code. 

< 

Under another more interesting interpretation, the more successful sub- 
jects did more phonetic recoding, as evidenced by the high likelihood that 
they would be primed by a phonetic property of the word lists. An Incidental 
resjilt of this recoding was the 'ability to better remember what they had read,' 
and thus better performance on the recognition test. 

Under the first of these interpretations, attention rather than the 
requirements of the reading task per se is what determines performance on the 
recognition test; the evidence found for mental representation of prosody is a 
by-product of a process, i.e., constructing the phonetic representation t which 
^is perhaps just one of several representations constructed incidentally in the 
course of performing the experimental task. 

Under the second interpretation, phonetic receding is an integral part of 
good reading, and so if people are reading well; they must be constructing a 
phonetic representation. This will then prime pronunciation of the ambiguous 
item in the absence of contextual cues. The availability of the phonetic 
representation incidentally facilitates performance on the recognition task, 
Better recognition result* from greater ease of access to or more completeness 
of the phonetic representation, which may in turn indicate superior reading 
ability. 

jgjr 

The first « (attention) explanation suggests that any number of codes 
results from attending to the list, and does not give any reason to attribute 
special status. to any code. Thus we should expect semantic and orthographic 
codes, for instance, to affect subjects' performance similarly to the phonetic 
code in memory tasks of the sort used in this experiment* Tne pattern of 
results reported for a similar task employed by Erickson et ah (1977), 
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suggesta that thia is not the case; the orthographic and semantic properties 
of their word lists did not affect performance in a short-term recaJU tas'* in 
the sane way that the phonetic /properties did. 

It should be noted that the response shi'ft was not equal for all the 
items tested. While a large effect was obtained for the words digest , per ait, 
project, conduct and pervert , other items ( -object , present t progress ) exhibit- 
ed little effect (or even a reverse effectT. Inoline Is the clearest case: 
In no trial was it possible to bias a subject in the test situation t6 
pronounce Incline as a verb, with second, syllable stress. The averages given 
In Column A are for preferred pronunciations across twenty subjects. These 
figures indicate that one pronunciation of incline, for example, was preferred 
ovir the other by eighteen subjects cut of twenty. What they do not indicate 
is how strong each individual's preference is. Though the former is much 
easier to measure, it provides telly a very rough estimate of the latter — which 
is, of course, what is really relevant to the biasing experiment. The failure 
of the biasing manipulation for incline may well be due to the fact that while 
approximately one person out of ten prefers it as a noun, most people may have 
it In their lexicons only as a verb. For these people, its stress pattern 
would be completely unshift%ble however psychologically real stress patterns 
are in reading. This suggests that for this kind of experiment it would be 
quite proper, and indeed optimal, to select words whose baseline frequency is 
about equal between noun and verb. 

The objection might be made that the effect found in the present 
experiment is merely an artifact of the particu, ar task employed, rather than 
a reflection of normal reading processes. To "ace this claim is to say that 
subjects employed strategies in the performance of this , task that were 
constructed « hoc for this purpose. But there is no logical requirement for 
such a strategy to include the construction of a phonetic representation; on 
the face of it, a visual representation would suffice. Nor is there any 
* reason to expect all subjects to arrive at the same kind of special strategy. 
Yet the more successful subject? employed a phonetic coding strategy, while 
those subjects who could not dfc : this did not seem to find another strategy 
thqt was similarly effective. Thus it appears that subjects were making' the 
be^t use they could of reading skills that were already available for more 
ordinary purposes. n 

While it might *e argued that the phonetic effects found by Conrad (1964) 
and Baddeley (1966), for example, and in the present experiment are due to 
rehearsal strategies for short-term recall, which have been shown to employ a 
phonetic representation (see Baddeley, 1976, Chapter 8» for discussion), this 
argument does 4ot apply to effects found in the acceptability judgment task 
employed by IQeiaan (1975)* which did. not require rehearsal. Thus the 
construction of a phonetic representation cannot be viewed as a mere artifact, 
of rehearsal, 

It could also be argued that for semantically integrated sentences, 
.readers might use a semantic code, and employ a phonetic code to facilitate 
demory only when the items in the experimental sequence* do not cohere 
semantically. The findings of Baddeley and Hitch (1974) address this criti- 
cism. They compared reaction times in a grammatical ity judgment task using 
ordinary sentences and sentences composed of phonetically similar (rhyming) 
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words* Phonetic similarity increased response latencies to grammatical and 
ungrammatical sentences* This ta$k does not involve rehearsal or short-term 
memory. But it does implicate the parser, lending support to the conclusion 
from Kleiman's' study that the sentence parsing mechanism requires a phonetic 
representation* quite apart from any requirements of short-term memory* e If 
subject a construct a fairly detailed phonetic representation in a relatively 
unnatural situation in which it affords thep no apparent advantage, we might 
also expect them to do it in a more natural situation ., In /other words, if 
subjects encode prosody when they read lists of words silently in a task that 
does not require comprehension, then it is likely that they will also encode 
prosody when they read ordinary sentences in a task that necessarily invokes 
the higher level processing involved in comprehension. 

An important finding from this experiment is th^t readers construct a 
mental representation that includes features not represented in the stimulus* 
Thus, while it might be maintained that readers of fcnglish represent the 
segmental features of the words they read just because these can be extracted 
by rule from the letters of the orthographic system (at least in most cases) , 
no such claim can be made for suprasegmental features such as stress, for 
there are no symbols in English orthography that indicate stress. In the 
stress-neutral pretest condition, subjects were always able to. name the 
homographs. That this was *Mt accpmplished by simply applying rules to 
translate from orthography to >honology is strongly suggested by the fact that 
r^t all w&rds having the swae orthographic structure were consistently 
assigned the same pattern of stress by a single subject. More likely, a bias 
of some sort, due to factors such as frequency of occurrence, was responsible 
for a subject's choice in each case. Such a bias could only come from the 
lexicon. This is true in the case of vowel quality in homographs ( lead , bow) 
as well. For these words, at least, naming written words must follow lexical 
access . 

Tnis must always be the case ^n naming Chinese logographs and Japanese 
kanji. These orthographic systems giv* very little phonological information, 
yet reading lists of words written IV these orthographies results in a 
phonetic representation in short-term meriory (Tzeng et al. t 1977; Erickson et 
al., 1977). Thus almost all phonetic information must be supplied by' the 
reader after lexical access. 

Further support for the active participation of the lexicon in reading is 
provided by Hebrew. The Hebrejr language is represented by alphabetic 
orthography that keeps the vowel symbol** fairly well separata! from the 
consonant symbols. In texts Intended for fluent adult readers, the vowel 
information is usually omitted entirely. However, it is the vowels of Hebrew 
that represent the inflectional system and carry most of the morphological and 
syntactic information. The task of the reader in Hebrew is to decide, 
presumably in the course , of parsing procedures, the syntafctic rple of each 
word, and its morphological composition in that role. Having derived this 
information, there is no reason to expect the reader of Hebrew to then add 
information about the vowels that would represent the word in speecn. But "the 
results of a study , by Navon arid Shimron (1981) suggest tfeat they do indeed do 
so. Their subjects' read lists of morphologically simple (uninflected) words 
in which vowel phonemes were represented by the optional vowel diacritics. 
Latencies in lexical decision tasks were increased by phonemically anomalous 
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diacritics but not by graphemically anomalous diacritics that preserved the 
phonology. The effect could not be attributed to visual factors. 

Their results suggest that in the simple case Qf^eadinfc unambiguous 
uninflected words, with no } concurrent processingNiemands such as those 
required for sentence comprehension t subjects both construct a phonetic 
representation and access the lexicon. (In thi? case, lexical access appears 
to follow grapheme-to-phoneme translation. However, there is ample evidence, 
as Navon and Shimron point out, for models of lexical access that include a 
visual route. In any case, the result is a phonetic representation.) Yet. 
Kleiman's results suggest that it is just in those cases in which processing 
for comprehension is required that the phonetic representation is importa/t. 
In the case of fluent readers of Hebrew in the ordinary situation of reading 
text, the construction of a phonetic representation is at least as likely to 
occur as in the simple case of lexical decision. However, here the construc- 
tion of the phonetic representation must follow lexical access, as with 
English homographs, Chinese logographs, ?nd Japanese kanji. But with Hebrew, 
it is also likely to be the case that the phonetic representation is the 
product of the parser, rather than of the lexicon, since it is the analysis 
resulting from the parsing process that indicates to the reader , what the 
morphology of the word must be, and thus what vowels must be supplied. 

The fact 3 about Hebrew, on the one hand, and English, Chinese and 
Japanese , on the other hand , suggest two hypotheses to account for the, effect 
found^ in the present experiment. Under one hypothesis, which I will call the 
lexical bias hypothesis, prosodic priming is a result of activity in the 
lexicon. There is evidence that stress (or some abstract representation from 
which stress can be derived by rule [Chomsky 4 Halle, 1968}! is a feature of 
lexical entries (Brown & McNeill, 1966), just as segmental phonological 
features and semantic features are; As such, stress can probably be primed 
similarly to semantic features (Meyer, Schvaneveldt , & Ruddy, 1975). As the 
activation of a single word may activate'any .number of lexical entries in the 
same semantic field, the activation of a single disy liable with first-syllable 
stress might activate (if slightly) all disyllables having first-syllable 
stress* The activation of nine such words may have the cumulative effect of 
activating the ^irst^syllable-stressed entry for 4 the homograph to a point 
where it is much more Readily available than the second-syllable-stressed 
entry, and thus more likely t9.be reported in the priming situation. 

The second hypothesis, suggested by the facts about Hebrew, may be called 
the parsing hypothesis. According to this hypothesis, even Isolated words are/ 
parsed^ that is, they are processed as one-word sentences (see Mattingly, Note 
1). It is in the parser that the morpho phonemic, representation retrieved from 
the lexicon is assigned a phonetic representation. This type of model is well 
suited to an orthography such as Hebrew* In fact, if it is assumed that the 
entire linguistic system, 6f wfiioh word recognition is only a part, is 
designed for the processing of linguistic structures, this type of model is 
equally well suited to English and any other language. The prosodic priming 
effect can then bp seen as the result of a bias induced in the parser as it 
constructs a complete phonetic representation, including prosody, for each of 
a series of one-word sentences. A small bit of evidence in support of this 
hypothesis for English is the apparent ease with which sentences containing 
homographs are read: In syntactic context, the grapheme sequence p-r-o-g-r-e- 
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a-s (for example) may be instantly recognized as*«e noun or a verb as a result 
oT~inforuation derived by the' parser. The entire analysis of the sentence up 
to the point where ^the homograph is encountered determines what syntactic 
categories are likely to occur in a well-formed structure and guides lexical 
access to the appropriate* entry, yielding, ultimately, the appropriate phonet- 
ic representation. 



REFERENCE MOTE 

1.* Mattingly, I. G. On the nature of phonological representations . 
Manuscript in preparation, 1981. 



REFERENCES 

Baddeley, A. D. Short-term memory for word sequence as a function of 

acoustic, semantic .and formal similarity. Quarterly Jourpal -of 

Experimental Psychology , 1966, 362-365. 
Baddeley, A. D. the psychology of memory. New York: Basic Books, 1976. 
Baddeley, A. D. , 4 Hitch, G. Working memory. In G. A. Bower (Ed.), The 

psychology of learning and motivation (Vdl. 8). New York: 1 Academic 

Press, 197«. * 

Brown, R. H., & McNeill, D. The tip of the tongue phenomenon. Journal of 

Verbal Learning and Verbal Behavior , 1966, 5, 325-337. 
Chomsky, N., A Halle, M. The sound pattern of Inglish . New York: Harper & 

Row, 1968. , " ; 

Conrad, R. Acoustic confusions in immediate memory. British Journal of 

Psyohplogy , 1964, 55,; 75-8M. 
Conrad, R. Speech and reading. In J. Kavanagh I. Mattingly (Eds.), 

Language by jar . and by eye : The relationships between speech and 

reading * iSa^^geTTtess.:!?!! Press, 1972. 
Erickson, D., Mattingly, I. G., & TUrvey, M. T. Phwiietic activity in reading: 

An experiment with Kanji. Language and Speech , 1977» 20, 384-403. 
Kleiraan, G. M., Speech receding in reading. Journal of \ferbal Learning and 

Verbal Behavior , 1975. H, 323-339. 
Hehler, J., & Carey, P. Role of surface and base structures in the perception 

of sentences. Journal of Verbal Learning "and Verbal Behavior , 1967. 6, 
- 335-338. 

Meyer, D. E. , Schvaneveld^, R . W . , & Ruddy, M. G. Loci of contextual effects 
on visual word-recognition. In P. M. A. Rabbitt- (Ed.), Attention and 
performance V. London: Academic Press, 1975, 98-118. 

Navon, D., A Shimron, J. Does tjord naming involve grapheme-tp-phoneme 
translation? Evidence from Hebrew. Journal of Verbal Learning and 
Verbal Behavior , 1981,20, 97-109. 

Tzeng, 0. J. L., Hung, D. L., & Wang, W. S-Y. Speech receding in reading 
Chinese characters. Journal of Experimental Psychology : Human Learning 
and Memory , 1977. 3# 621-630. 



153 



CHILDREN'S MEMORY FOR RECURRING LINGUISTIC AND NONLINGUISTIC MATERIAL IN 
RELATION Tp READING ABILITY* 



Isabella Y. Liberman,* Virginia A. Hann,++ Donald Shankweiler,* and Michelle 
tferfeltaan+ . 



Abstract . Good beginning readers typically surpass poor beginning 
readers in memory for linguistic material such as syllables, words 9 
and sentences. Here tie present evidence that this interaction 
between reading ability andiyemory perfonaance does not extend to 
memory for nonlinguistic material like* faces and nonsense designs. 
Using an adaptation of the continuous recognition memory, pacadigm of 
Kimura (1963) we assessed the ability' of good and poor readers in 
the second jgrade to remember three different types of material: 
photographs of unfamiliar faces, nonsense designs, and printed 
nonsense syllables. For both facte and designs, the performance of 
the two reading groups was comparable; only when remembering the 
nonsense syllables .did the good readers perform at a significantly 
superior level. These results support other evidence that distinc- 
tions between good and pdor beginning readers do not turn on memory 
per se t but rather on .memory for linguistic material. Thus they 
extend our previous finding that poor readers encounter specific 
difficulty with the use of linguistic coding in short-term memory. , 



The performance of good beginning readers on certain language-based 
short-term memory tasks, like their performance on many other language-related 
tasks, tends to be better than that of children who encounter difficulty in 
leaning to read. The association between reading ability and such short-term 
memory skills is by now well-documented. For example, children who are good 
readers tend to have a better memory for strings of written or spoken letters 
(Shankweiler, Liberman, Mark, Fowler, & Fischer, 1979). They are also more 
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•uootgfsful affrecalling strings of spoken words, sod even at recalling the 
words of spoken sentences (Xann, Liberman, 4 Shankwellet, 1980). 

However, our concern his been not simply to dociaent this performance 
difference but instead to uncover the probable cause of the difference, We 
first approached this problem by turning what appeared to us to be the special 
advantages of good readers againat them. Since we knew that for adults, the 
presence of a high density of pbonetioally-confusabie items hinders the use of 
ap<boh~related processes in short-term memory, we were led to examine the 
effect of the sans manipulation on the performance of .good and poor readers. 
Ha found that like adults, go«d beginning reader-* appear to mate effective use 
of phonetic coding in short-term memory, whereas poor readers «do not. Thus we 
have shown that he memory performance of good readers falls sharply, even to 
the level of that of the poor readers, wh*n they are asked to rstenber a 
letter string, word string, or sentence containing a high density of phoneti- 
cally-conf usable items (letters with rhyming names, or words that rhyme with 
one another), whersas the performance of poor readers remains little changed 
by this type of material. 

At this point in our investigations, we were led* to e*k whether there are 
any other differences between the short-term memory capacities of good and 
poor readers; beyond those that reflect differential use of a speech code. 
After all, studies of patients with lateral! ted brain disease have revealed 
that verbal and non-verbal short-term memory abilities may be relatively 
independent (see, for example: Kimura, 1963; Milner A Taylor, 1972; Warring- 
ton A Shallic*, 1969). Hence it seemtd at least possible that the ability of 
poor readers to use nonverbal short-term memory processes could be equal to 
that of good readers. While this possibility 1? supported by findings that 
good and poor rnmlwB are equallyj successful at remembering unfamiliar 
(Hebrew) orthographic designs (Val l at i n o t Stager, Kaman, A DeSetto, 1975), it 
might seem inconsistent with findings that good r**Aw&. surpass poor readers 
in remembering abstract flgural patterns (Horrison, Giordan! * & Nagy# 1977) 
and spatial-temporal patterns t Gorki n, 1974). In our opinion, however, 
neither of these latter findings can be regarded as conclusive evidence that 
poor twHwb have difficulty with nonlinguistic short-term memory, per se, 
since both derive from materials that lend themselves to verbal labeling and 
to the use of linguistic memory strategies (Liberman, Hark, A Shankweller, 
1978). Therefore, it remained to be determined whether or not poor readers 
encounter difficulty with memory processes other than those Requiring use of a 
speech code. We sought to investigate this question in the present study by 
comparing the ability of good and poor readers to remember linguistic material 
t with their ability to remember material that is not only nonlinguistic but 
also not* readily susceptible to linguistic coding. 

OUr subjects were good and poor readers In a second-grade classroom* 
whose nemory abilities were tested 1 with an adaptation of the continuous 
recognition aemory paradigm of Kimura H963). Using that paradigm, we 
assessed the children's ability to remember each of three types of materials: 
nonsense designs, photographs of unfamiliar faces, and printed CVC nonsense 
i syllables. Whereas the nonsense designs were those employed In Kimura* s 
drlginal study (1963), the facial photographs and nonsense syllables were our 
own innovation. Studies of adult patients with focal brain damage reveal that 
the ability to encode and remember the nonsense designs thuu Kimura employed 
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suffers as* a consequence of right hemisphere temporal lobe excision but is 
relatively unimpaired by ooaparable excisions to the left, language-dominant, 
hemisphere (Kimura, 1963; Hiiuer. 1974; Milner & Teuber, 1968). Likewise, the 
ability to encode and subsequently to recognize unfamiliar faces has been 
determined to be a right-hemisphere capacity that does not demonstrably depend 
on the language mediation skills of the left hemisphere (Leehey, Carey, 
Diamond, & Cahn, 1978; Tin, 1970). In contrast, the encoding and recognition 
of English-like nonsense syllables is a linguistic ability that suffers as a 
consequence of .damage to the left hemisphere (Coltheart, 1980; Patterson 4 
Harcel, 1977; Saffran 4 Karin, 1977). 

We anticipated that the results obtained with good and poor readers in 
the case of nonsense designs and faces would differ from those obtained with 
nqnsense syllables. Good reade* j were not expected to surpass poor readers in 
memory for either the nonsense designs or, the f ces, since neither of these 
sets of Items lend themselves readily to the use of language coding. In the 
event, however, that good readers should excel at recognizing either of these 
materials, it would be taken as evidence that the poor readers do indeed have 
broader deficiencies In remembering, He expected good readers to surpass poor 
readera in memory for nonsense syllables* on U*e assumption that tfceir use of 
phonetic coding as a mnemonic device would be superior to that of poor 
readers. 

tCTHOD 

Subjects 

The subjects in *this experiment ,.)*ere 36 second-grade children who 
attended the public schools in Mansfield, Connecticut. An initial pretest 
group was selected on the basis of the children's Total Reading Score on the 
Stanford Achievement Tests, tfilch had been administered earlier in the same 
school year.x Candidates for the good reading group bad received grade scores 
of from 3.1 to 5.0, whereas candidates for the poor reading group had received 
scores of 1.5 to 2.1. Final selection of 18 good readers and 18 peer readers 
was made on the basis of scores on the Word Recognition Subtest of the Wide 
Range Achievement Test (WRAT) (Jastak, Bijou, A Jastak, 1965). Children 
selected as good readers had W1AT reading grade equivalents ranging from 3.1 
to 5.0. with a mean score of *.0; children selected for the poor reading group 
received .grade equivalents from 1.5 to 2**, with a mean score of 2.1. 

Hean ages for good and poor readers were 9**.Q months and 9**. 2 months, 
respectively, and were not significantly different. Individual administration 
of the WI3C«fi revealed good readers to h*ve a meat, Full Scale 10 of 
with mean Verbal and Performance IQ f a of 112.1 and 112.9* respectively. Poor 
refers received mean Rill Scale IQ of 10?. 7 % with Verbal and Performance 10 *s 
of 1011. 9- and 109.1, respectively. There were no significant differences 
between good and poor readers on any of the 10 measures. 

Materials 

There were three different types of materials: nonsense designs* faces, 
and syi *bles, The tests using these three types of items were identical in 
manner of construction and presentation, each modeled on Xiaura's (1963) 
recurring recognition memory task. 
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Nonsense designs . There were 80 nonsense-design stimuli < each of which 
was one of the 52 irregular line drawings of Kimura (1963). Four of the 
designs were used eight times each (the recurring designs), and the remaining 
48 once each (the nonrecurring designs). Each stimulus was drawn on a 3 x 5 
card. For the purpose of testing, the stimuli were divided into eight sets of 
ten; within each set of ten, the four recurring designs were randomly 
interspersed with six of the nonrecurring designs. The first set of ten 
stimuli constituted the inspection set, the remaining seven set^s contained the 
actual test stimuli. 

Faces . Face r recognition stimuli were constructed using 52 black and 
white photographs, half of which were adult female faces and half adult male 
faces. In both the male and female stimuli sets, half were photographed 
looking to the left and half looking to the right. To minimize distinguishing 
details that might lend themselves to verbal labeling, no faces were Vaed that 
displayed hair, eye-glasses, jewelry or distinctive markings such as scars, 
distinctive makeup, etc. In addition, a- uniform mask was applied to each 
picture to cover hair and background detail as well as to ensure a uniform 
size. 

Again, a set of 80 stimuli was constructed. FoQr photographs occurred 
eight times each (two male faces and two female faces, two looking to the left 
and two looking to the right) whereas the remaining 48 occurred once each. 
The stimuli were divided into eight sets each, with each set containing the 
four recurring photographs randomly interspersed among six nonrecurring ones. 
The first set served as the inspection set, the remaining seven sets contained 
the test stimuli . * 

Nonsense syllables . Stimuli for thij parte of the experiment were 
constructed from a set of 52 CVC nonsense syllables that ,had been selected 
from Hilgard (1962) tp have a moderately low association value. Across the 
different syllables, frequency of occurrence of each letter was controlled as 
much as possible. The vowels a f e, and u appeared 11 times each, 1. appeared 
nine times, and £ appeared jten times. Every consonant (with the^exception of 
it, and ^ in initial position and g, h, and w in final position > occurred 
at least once, wth some consonants occurring as often as six times. 

From the syllables, a set of 80 'stimuli was constructed. Four of the 
Stimuli occurred eight times, while each of the remaining 48 occurred once. 
The stimulus cards were again divided into height seta of ten each; within each 
set of ten. the four recurring syllables were randomly interspersed with six 
non-recurring ones. The first set of "ten constituted the presentation trials, 
the remaining seven sets contained the test stimuli. 

Procedure 

* Each child was tested individually", with the nonsense designs being 
presented on the first day of testing, and the faces and syllables on a second 
day. The procedure for the recurring recognition mempry paradigm was adapted 
fro* Kimura (1963) and was the same for all three types of material. 

The experimenter began each test by telling the chil 1 that some designs 
(or faces or syllables), would be shown, one at a tiroje, and that the task was 
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to look at each one very carefully and try to remember it* She then presented 
the inspection set of ten cards, showing each card for approximately tnree 
seconds* Subsequently, the child was told that more cards would follow, some 
of which would be identical to those presented in the inspection set, and some 
of which would be new cards* The instruction was to say "Yes" if a card had 
been seen before, and "No" if it had not* The test items were then presetted 
to the. child, who was required to respond to each one before being shown the 
next* 

RESULTS 

In order to evaluate the performance of the subjects, we first computed 
the percentage of correct responses made by each subject, separately for each 
of the three types of materials (nonsense designs, faces, and syllables)* 
This was done by sunning the number of correct recognitions and correct 
rejections, and dividing by 70 (i.e., the total nusber of test items presented 
in each condition)* After first noting that the performance of th*$ subjects 
on all three types of material was consistently above the chance level of 50 
percent correct, we turned to the major purpose of our study, which was to 
evaluate the extent of difference between the performance of *good and poor 
readers on each of the three different types of iteas. 

The results of an ANOVA computed on the variables of reading ability 
(good versus poor readers) and material type (designs, versus faces, versus 
syllables) revealed a significant effect of material type, F(2, 68>=73.3, 
p<.001, reflecting the fact that designs and faces were typically harder to 
remember than syllables. There was further the anticipated interaction 
between the effect of item type and reading ability, F(2,68)*8.3, p<.0OK As 
can be seen in Figure 1, good readers were not significantly better than poor 
readers at remembering either nonsense designs or faces* (For nonsense 
designs, t(3M)=1.4, p>.t; for faces, t(3*O=0.1 f p>.6>. In fact, poor readers 
were slightly (although not significantly) betU. at remembering nonsense 
designs. Good readers, however, were significantly better than poor readers 
at remembering the nonsense syllables, t(3^)=3.2 t p<*0Q5. 

DISCUSSION 

The results, then, upheld our predictions. Poor readers were equal to 
good readers in ability to remember both nonsense designs, and faces. In 
contrast, poor readers made significantly more errors than good readers in 
recognizing the nonsense syllables. Thus we find no evidence that children in 
the two reading groups differ in general memory ability. Rather, we again 
find* them to differ only in mejnory for linguistic items. These findings help 
us to place in perspective two claims that are frequently made regarding the 
origins of many childhood reading problems. One claim sees a "general memory 
deficit" as central (Morrison et al., 1977). According tc that hypothesis, 
which views poor readers as having difficulty with memory, j>er se, poor 
readers might be expected to show inferior performance for linguistic material 
and figural material alike* Clearly, our results are incompatible with this 
view, since it was found that good and poor readers differed solely in memory 
for the syllables... 
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Ftgyr« ft Mean parcaotac* of eorr«*5t raa*, w arte by ^ ~4 and poor rtadera 
on nonMose dtaifaa, facta, and rxmaanae syllabus. 
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* A second theoretical clam suggesttf~*Hat failure of serial order memory 
^la the core , problem (Bakker, T972; eoricin, 197,*; Holmes ft McKeever, 1979)* 
Our task did not require Jthat sub jects remember the order of items in the 
inspection set. yet we nonetheless, obtained a difference between good and .poor 
readers* , ability to rem^ber nonsense syllables. Thus the poor* readers 1 
,memory problem goes beyond serial order alone. In this reapect , the present 
findings confirm earlier result* by Kark, Shankweiier, Liberman, and Fowler, 
1977 and Bar »* and Shea, 1979. He do realize* however 9 that a material- 
specific deficit in *r«cr memory could be a consequence of failure to sake 
effective use of phonetic coding. Indeed, in a recent study (Katz; 
Shankpeller, 4 Li hereon* note 1) some of us found that good and poor readers 
selected by % the same criteria *s in the present study differed in ability to 
recall* onder of tb*i items. But the good readers ekcelled oalySAien their task 
was to recall the order of items tftat could be coded in terns of linguistic 
labels.^ No difference was fqund in memory for the 4 order of nonrecodable 
items. "Thus th? problems of poor readers in recall of items, jjer se, and in 
recall of item order appears >to be linked to soae difficulty with using a 
phonetic code— either a failure to rVeode phonetically or a- weakened tendency 
to use this coding principle. 

In - «jamary, then, we have discovered an instance in whiv * despite 
identical procedures, good and poor readers differ in the ability to remember 
Ifrtguega-based aaterial, but fail to differ in memory for two type* of 
* nonverbal aaterial. Thus we conclude tha* the shefft-tera memory def t its of v 
povr readers appear indeed to be restricted to the domain of phonetic 
representation in &hort-tetm memory, Several questions 'arise at tnij paint, 
4icng thelT the question of Tk>y poor readers fail to make effective use of a 
phonetic cpde, and the question of how a deficient linguistic memory comes to 
be associated with problems in learning t-> read. At pnesftnt we .are addressing 
the first of these* 'questions by en saining tl~e pattern of memory errors made by 
poor readers. Our approach to the second, however, is guided by a considera- 
tion of the relation between she; t-term memory and normal language processing 
(Baddeley. 1978; Liberman, Hattingly, ft Turvey, 1972), which leads us to «sk 
whether ptdSr readers encounter difficulty, on the type of language comprehen- 
sion tasks used in studying aphasic patients (Caramazza & Zurif, 1978). We 
suspect that answers to these two questioner iaay* br*ng us closer to an 
understanding of the reading process as well as of the process of reading 
acquisition. % 
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PHONETIC AND AUDITORY TRADING RELATIONS BETWEEN ACOUSTIC CUES 
IN SPEECH PERCEPTION; PRELIMINARY RESULTS 



Bruno H, Repp 



Abatra et . ifofen two different actfuatic cues contribute to the 
perception of a phonetic distinction, a trading relation between the 

' cues can be demonstrated if the speech stimuli are phonetically 
ambiguous. Dp the cues trade also in unambiguous stimuli? Four 

* different trading relations were examined uslnp a fixed-standard AX 
discrimination task with stimuli either frpm the vicinity of the 
phonetic category boundary .or from within a phonetic category* The 
results suggest that certain trading relations (presuubly of audi- 
tory origin) hold in both conditions while others are tied to the 
perception of phonetic contrasts and thus appear to be specific to 
the speech mode. 



Virtually any phonetic distinction has multiple correlates in the acous- 
tic speech signal. ThatrTs, the ^rticulatory adjustments required to change 
from one phonetic category to the other (other things equal) cayse acoustic 
changes along several aeparable physical dimensions — spectrin, amplitude, 
time* While a listener typically perceives only a single change— *iz. f one of 
phonetic category— the physical changes that led tor this unitary percept can 
only be described In the form of a list with multiple entries. When the 
signal properties thus listed are manipulated individually in an experiment, 
£t is generally found that they all have perceptual cue value for the relevant 
to**!™*!™* although they may differ in their relative importance. 
If one cue in such an *nsembl~ is changed to favor category 8, another cue can 
be modified to f#vor category A, so that the phonetic percept remains 
unchanged. This is called a trading relation. Presumably, any two cues for 
the same phonetic distinction can be traded off against er-th other within 
limits set by their acceptable range of values and ) by their relative 
perceptual weights. Numerous .recent studies of trading relations have been 
reviewed by Repp (1981b>t some of them will be discussed further below, 

The mechanisms by which a listener's -brain combines a number of diverse 
cues into a single phonetic percept are not known, but there are two 
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contrasting views on that issue./ One view (e.g., Liberman & Studdert-Kennedy f 
1978; Repp, Liberaan, Eccardt, '4 Pesetsky, *978) holds that the perceptual 
integration of , acoustic cues is motivated by their common origin in the 
production of a phonetic contrast; that is, listeners are assumed to possess 
and apply detailed tacit knowledge of thd multiple acoustic, correlates of 
articulatory maneuvers. The other view (best spelled out in* Pastore, 1981) 
maintains that integration of, and trading Delations between, acoustic cues 
might arise either from integration or from Interactions (such as mas^ng or 
contrast* at a purely auditory level of processing, without reference >to the 
articulatory origin of the cues. The evidence so far (summarized in Repp, 
1981b) strongly favors the first view. However, it is conceivable that, as 
more is learned about auditory mechanisms, certain trading relations between m 
acoustic cues will find auditory explanations, particularly those that seem to* 
have no good articulatory rationale. Since many perceptual trading relations 
have been demonstrated with synthetic stimuli and without a parallel •examina- 
tion of speech production, the relation of the- perceptual results to what 
happens in ar .culation may not always be as close as h^s been supposed, and 
some trading relations' may actually have been caused by auditory cue interac- 
tion^. 

Undoubtedly, detailed studies of speech production and speech acoustics 
as well as auditory psychophysics will shed further light on this issue, 
there is a more direct experimental approach, however, which makes use of the 
fact- that, under certain circumstances, tne same (or highly similar) stimuli 
may be heard either as speech or as nonspeech. Such different percepts may be 
achieved either by presenting speechlike stimuli *o human listeners under 
different instructions, relying primarily on the subjects 1 postexperimental, 
reports, about whether the stimuli in fact sounded speechlike or not, or by 
contrasting human perception of speech with that of nonhuman^ animals. In 
either case, the demonstration of a trading relation in all subjects or in all 
conditions would favor an auditory account, while the finding that a trading 
'relation holds only when human listeners claim to perceive the stimuli .as 
speech, but not when they claim to hear nonspeech sounds or when the listeners 
are nonhuman, would constitute strong evidence in favor of the speech-specific 
(articulatory-phonetic) origin of the trading relation. 

There ate no completed studies of trading relations in animals, but 
interesting results are expected soon from several laboratories. For chin- 
chillas, Kuhl and Miller (1978) have reported a shift In the voicing boundary 
for stop consonants with plaoe of articulation — an effect that may, in part, 
be due to a trading relation between voice onset time and formant onset 
frequencies (cf. Summerfield & Haggard, 1977). A traoiw* relation between 
these two variables has also been demonstrated in human infants (Miller & 
Eimas, Note 1); however, rather than pointing towards psychoacoustic interac- 
tions, this finding may indicate that human infants are biologically prepared 
for phonetic perception. The present experiments focus on, several effects 
that have not yet been demonstrated in either infant or animal subjects. 

In studies using adult human subjects, two methods have been applied t'j 
address the question of the origin of trading relations. One is to construct 
stimuli that contain the critical cues under investigation but are sufficient- 
ly different from speech in other respects, so as to be perceived as nonspeech 
by naive subjects byt as speech by more experienced or specially instructed ■ 
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labjects. The technique of imitating the speech formants with pure tones has 
served this purpose well (Bailey t Summerfield, & Dorman t 1977; Best, Morrongi- 
ello, & Robson t 1981; Remez, Rubin, Pisoni, & Carrell t 19810;. The other 
method is to use speech stimuli and to lead listeners 9 through special 
instructions and practice, to perceive them analytically-— to segregate* them 
into their auditory components, as it were. This is a notoriously difficult 
task, but it is possible with certain special stimuli, e.g., with fricative- 
vowel syllables (Repp, 1981a). In all of these studies—some of which will be 
described in more detail below — subjects 1 response patterns were radically 
different when the stimuli were heard as speech than when the same stimuli 
were heard as nonspeech; in p^ticular, the trading rel~ <ons or other 
contextual effects under investigation were observed only in the speech mode. 
However, as noted above, this result may not hold for all trading relations. 

The present experiments explored a third method, which has the advantage 
of simplicity and general applicability, thus making possible the parallel 
investigation of a number of different trading relations. The mefthod Is a 
simplified version of a procedure used by Fitch, Halwes, Erickson, and 
jXi&erman (1980) to demonstrate the categor/cal perception of speech stimuli' 
jvarjying in two cue dimensions. Fitch et al. were concerned with a trading 
((nelatiefk between a temporal and a spectral cue for the^sl it"- "split" 
contrastr the amount of silence between the fricative noise antkthe periodic 
stimulus portion, and the presence or absence of formant transitions (appro- 
priate for a labial stop) at the onset of the periodic potion. _In an 
identification task, less si'lence was needed *to change "slit" to "split" when 
formant transitions were present than when they were absent. In a subsequent 
oddity discrimination task, Fitch et al. compared performance on three types 
of trials: .(1) Spectral difference 6nly ("one-cue condition"); (2) spectral 
n and temporal difference, the stimulus with the formant transitions always 
having the longer silence ("two-cooperating-cues condition"); and (3) spectral 
and temporal difference, but the stimulus with the formant transitions now 
having the shorter silence ("two-conflicting-cues condition"). Subjects were 
considerably more accurate ir^ the second than in the third condition, with 
performance in the first condition in between. This ordering of conditions 
was predicted from the way the stimuli were labeled by the subjects. In 
essence, these results revealed that speech stimuli Varying on two dimensions 
.are still categorically perceived. The listeners appeared to base their 
discrimination judgments on the phonetic labels of the stimuli, and thus the 
trading relation between the two cues was exhibited in discrimination as well 
as in labeling responses. 

What would h?ppen. however, if subjects could not rely on phonetic 
labels'? Such a situation would arise if the stimuli to be dis6riminated were 
perceived as belonging to the same phonetic category. We knqp from many 
earliej* studies of categorical perception that such discriminations are 
difficult to make, but subjects typically perform at a level better than . 

. chance and their performance Ifiay be enhanced by increasing physical stimulus 
differences and/or % by using a paradigm that reduces stimulus uncertainty. If 

. subjects cannot rely on phonetic labels, they must make their discriminations 
on the basis of the auditory properties or the stimuli. If some of these 
properties interact at the auditory level of perception and thereby generate a 
trading relation, then this trading relation should be observed regardless of 
whether or not listeners can mak* phonetic distinctions. On the other hand, 
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if a trading relation is phonetic in origin, then the unavailability l of 
phonetic contrasts should lead to a disappearance of the trading rela ion. 
Since, in this case, the cues are presumably independent at the auditory 
level, a difference in two cues should be at least as easy to discriminate as 
a difference in one cue (cf. Espinoza-Varas, Note 2), regardless of whether 
the cue values are paired in the cooperating or the conflicting manner (a la 
Fitch et hi, 9 1980). 

This is the rationale underlying the present experiments. To* simplify 
the design, the cooper ating-cues cond tion was omitted. The critical compari- 
son was between 1^oue and 2-cue (conflicting-cues) trials in twp discrimina- 
tion conditions: Between phonetic categories and Within a single phonetic 
category. A trading relation in the Between condition (where stimuli con- 
trasted phonetically on some, but not all, trials) should show up as poorer 
performance on 2-cue than on 1-cue trials. The same pattern in the Within 
condition would suggest that the trading relation is auditory in origin. On 
the other hand, equal or better performance on 2-cue than on 1-cue trials in 
* the WitKin condition would indicate that the trading relation is absent and, 
therefore, that its occurrence in the Between condition has a phonetic basis. 

Four different trading relations were investigated in four parallel 
experiments that were identical except for the stimuli and their dimensions of 
variation. Therefore, the general method will be described first, followed by 
a discussion of the individual experiments. 



GENERAL ffiTHOD 



Stimulus Tapes 

Each experiment employed speech stimuli (natural or synthetic words) 
varying on two cue dimensions for a specific phonetic contrast. One cue — the 
primary cue — was always temporal in nature and assumed several different 
values, whereas the other cfie — the secondary cue— assumed only two different 
values. Two sets of four values of the primary cue were selected: One set of 
shorter values was intended to span the phonetic category boundary (Between 
condition), while the other set had longer values intended to fall entirely 
within the corresponding phonetic category (Within condition). Because 
Weber's Law holds, approximately for the discrimination of duration (e.g., 
Creelman, 1962), and to facilitate discrimination in the more difficult Within 
condition, the values in the Within stimulus set were spaced farther apart 
than those in the Between set. The two values of the secondary cue were 
chosen so as to be difficult to discriminate but still sufficiently different 
to generate an observable trading relation. 



A fixed-standard AX (same-different) discrimination task was used. This 
task has several advantages, which include low stimulus uncertainty (which 
tends to raise discrimination scores), relatively short test duration, and 
direct convertibility of the data into d f scores. The stimulus tapes for the 
Between and Within conditions were identical except for the settings of the 
priJ&ar, cue. The fixed standard stimulus occurred first in each stimulus pair 
and was constant throughout each condition; it had the shortest jalue of the 
primary oue and the more conflicting of the two values of the secondary cue 



(i.e., that value which, more than the other value, favored the same phonetic 
category as did an increase in primary cue duration). Each condition 
contained four blocks of stimulus pairs. The first block of 48 pairs was for 
practice only: On half the trials, the standard was paired with itself; on 
the other half, it was followed by that stimulus which had the longest vtflue 
of the primary cue but the same value as the standard of the secondary cue. 
In other words, thfe practice block contained only identical and (relatively 
easy) 1-cue trials. The first test block of 72 pairs contained the same pairs 
as the practice block plus 24 2-cue trials. On these latter trials, the 
difference in the primary cue between the standard and comparison stimuli was 
the same . as on 1-cu# trials, but there was an added difference in the 
seoondary cue whose setting in the comparison stimulus "conflicted" with its 
longer value of the primary cue, thus makittg^ieerimination more difficult if 
(and only if) the two cues engaged in the predicted trading relation. The 
remaining two test blocks of 72 trials each were similar except that the 
magnitude of the difference in the primary cue was reduced, thus making the 
task increasingly more difficult. This was done to counteract possible 
celling effects due to individual differences in discrimination accuracy and 
in phonetic boundary locations. It also served to explore a range of stimulus 
differences, since it was not known in advance how well naive subjects would 
perform in this task. 

The standard and comparison stimuli in a pair were separated by 500 msec 
of silence. The interpair interval was 2 sec, and tliere were longer pauses 
between blocks. 

Procedure 

The subjects were tested individually or in small groups. The stimuli 
were presented over TDH-39 earphones at a comfortable Intensity. All subjects 
listened first to the Within condition, followed by the Between condition and 
-by a repetition of the Within condition. The repetition served to investigate 
whether experience with phonetic contrasts in the Between condition had any 
effect on subjects' strategies in the Within condition; it also gave a second 
chance to those subjects who fot^d this condition very difficult the first 
time. In all experiments except the first, the discrimination tests .were 
followed by a brief labeling test in which the seven different stimuli used in 
the Between condition were presented 10 times in random order. (The labeling 
test for Exp. 1 was administered at the e.id of Exp. 4b.) This test was added 
to verify the trading relation between the two cues. 

i 

Instructions were kept to a minimum, The subjects were told about the 
genefal procedure and attuit the relative diVficulty of the task. They were 
not Informed about the Afference between the two experimental conditions 
(except that the stimuli lbuld be slightly different), and they were not told 
the relevant phonetic labels or the auditory cue dimensions that varied, 
Rather, they were ^eft to discover these by themselves as they listened to the 
48 practice trials. For these trials only, the correct responses (s, d) were 
printed on the Bnsyitr sheet, and the subjects merely checked them off as they 
went along.' It was hoped that, after this experience, the subjects would have 
some idea of the difference to listen for (i.e., that in the primary cue 
dimension). They were told that the differences in the subsequent test blocks 
were of ^fe same kind, but that they would get smaller in magnitude. They 
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were not informed about the introduction of another kind of difference (that 
in the secondary cue dimension) or of the consequent increase in the true 
proportion of "different 11 trials from 50 to 67 percent, but it was mentioned 
that % any kind of difference perceived warranted a response of ."different." 
Clearly, the procedure was designed to focus the subjects 1 attention on the 
primary cue; since only this cue varied in the practice block. 

The subjects responded by writing down "s" or "d" on each trial, guessing 
If necessary. After each of the three test conditions, they were interviewed 
about, their impressions and strategies. In the final labeling test, they 
chose from the two relevant categories (which they were told) and wrote down 
their responses in abbreviated form. 

Analysis 

Individual subject scores in each test block were converted into d f 
values, taking, the proportions of "different" responses on 1-cue and 2-cue 
trials, respectively, as separate Ijit rates V^nd the proportion of "different" 
responses on trials of identical stimuli as the joint false-alarm rate. 
Proportions of 0 and 1 were treated as .01 and .99, respectively, thus 
limiting d f to a maximun value of 4.66. 

Three analyses of variance were conducted on subjects 1 d f scores in each 
experiment. The first analysis was on the Between condition only, with the 
factors Cues (1-cue vs. 2-cue) and Blocks (three. levels of difficulty). The 
second analysis was on the Within condition only, with the factors Repeti- 
tions, Cues, and Blocks. (In Exp. 3t only the second repetition was *na- 
iv/£d.) The absence of any interactions between Repetitions and the other 
factors justified the combination of the two repetitions for the w third 
analysis, which compared the Between and Within conditions with the factors 
cA^titions, Cues, and Blocks. The critical effect in this last analysis was 
tae Conditions by Cues interaction, which was expected to reveal whether or 
not tha same trading relation (or other response pattern) held in the two 
conditions. 



EXPERIMENT ±1 "SAY*- "STAY" 

The trading relation studied here concerned, as the primary cue, the 
amount of silence following the fricative noise and, as the secondary cue, the 
onset frequency of the first formant (FD following the silence. This trading 
relation, which is similar to that for "slit»!-"split" studied by Fitch et 
al, (1980), h*s been previously investigated • by Best et al. (1981): Less 
silence is needed to change "say" to "stay" when F1 starts at a lower 
frequency. Best et al. confirmed this trading relation in two different 
discrimination tests (oddity and variable-standard AX). These tests actually 
included some wi thin-category trials along with' betwe en-category trials, and 
the trading relation could be seen to disappear within the "stay" category. 
However » this result is not conclusive, since it may reflect a floor effect 
and is based' on rather few responses. ^It is interesting to note, however, 
that the similar data of Fitch et al. (1980) for the "slit"~"spli# h contrast, 
although they are open to the same objections, actually suggest a reversal of 
the trading relation ia the wi thin-category regions: Whereas the ordering of 
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performance on the three types of trials was cooperating cues > one cue > 
conflicting cues in the phonetic boundary region; it changed to cooperating 
cues s conflicting cues > one cue (at chance) within categories. This is 
exactly the pattern one should expect from a trading relation that Is specific 
to phonetic perception. 

This expectation was further confirmed by Best et al. (1981), in an 
elegant study with " sine-wave analogs 11 of *aey"-*stay* stimuli. Subjects who 
reported that they heard the sine-wave stimuli as (highly unnatural) tqkens of 
"say* or "stay* exhibited the same trading relation between silence duration 
and FU-analog> onset freque^ y as was observed Yn speech stimuli, whereas 
-those subjects who heard the ^ne-wave stimuli as nqnspeech showed a radically 
different pa***r n 0 f responses that suggested that they paid selective 
attention to variations in one or the other cue. They neither integrated the 
cues into a unitary percept, nor did the settings of the unatt**ded cue have 
much effect on the perception of the attended cue. # 

Given these* rather convincing results*, the present re-investigation of 
the «say*-"stay" jontrast served not o?ly to replicate the findings of Best et 
al. but also to validate the new procedure. The prediction was, then/ that 
the trading relation between silence duration and F1 onset frequency would be 
observed only in the Between condition but not in the Within condition. 

Method 

Subjects . Eleven volunteers were recruited by announcements on the Yale 
University, campus and were paid for tfheir participation. Host of them had 
served in earlier speech perception experiments. A different group of 9 
subjects (those of Exp. Mb) took the brief labeling test. 

Stimuli . The stimuli were hybrids composed of a natural-speech [3] noise 
followed by a synthetic periodic portion, the [si* ndise ^derived from a male 
speaker's utterance of tsa]. The periodic portion was produced on *the OVE 
IIIc serial resonance synthesiser at Kasklns Laboratories, following formant 
specifications provided by BesJ eiy^L. (1981) in their Figure 1 (speaker SSBK 
The fricative noise was 212 feec long, with a gradually rising amplitude pver 
the first 170 msec and a rapid fall thereafter. The duration of the synthetic 
periodic: portion war 300 wec ^ It had a fairly abrupt "onset and a fundamental 
frequency that fell linearly from T10 to 80 Hz. 

Trie two stimulus portions were concatenated after both had been digitized 
at 10 kHz using the Haskins Laboratories PCM syster The primary cue was the 
amount of silence between them. In the Between condition; the standard 
stimulus had no silence at all ("say"), and the comparison stimuli had 30, 20 f 
and 10 msec, respectively, on "different" trials in the three test blocks. In 
the Within condition, the standard had 70 msec of silence ("stay"), and the 
comparison values were 130, 1-15, and 100 msec. The "say"-"stay" boundary was 
expected to be in the vicinity of 20 msec of silence. The secondary cue was 
the onset frequency of F1 in the periodic portion. On 1-cue trials, it was 
200 Hz, whereas, on 2-cue trials, it waa raised to 299 Hz— a cue favoring 
"say" and thua "conflicting* with the ^longer silence cue in the comparison 
k stimuli. The difference in F1 between the two versions of the periodic 
''stiff lus portl ns gradually diminished over the first 40 msec (the extent pf 
the 1 transition) 1 m 
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Results 

The results are shown in Figure 1. The first panel shows average d' 
scores in the Between condition. Discrimination performance was high in the 
first, block but decreased rapidly as the difference in the primary cue was 
reduced, F(2,20> * 24.4, £ < .001. As predicted from the trading relation 
between the primary and secondary cues, performance was higher on 1-cue than 
on 2-cue trials; however, this difference did not reach significance due to 
high IntersObJect variability, F(1,10) * 3.7, £ < .10. The Blocks by Cues 
Interaction was likewise nonsignificant. 

The second panel of Figure 1 shows the results of the Within condition. 
These results represent the combined (i.e., averaged d v ) scores of the two 
repetitions of this condition, which exhibited highly similar response pat- 
terns. Performance was only slightly better in the second run, F( 1 ,10). = 4.0, 
£< .10; no 'factor interacted with Repetitions. Discrimination scores started 
at a lower level In this condition than in the Between condition, even though 
the difference in the primary cue was twice as large. Performance declined 
over blocks, F(2,20) * 14.2, £ < .001, and this effect did not interact with 
Cues. Host Importantly, the difference t^tween the two types of trials was 
reversed here, performance being better on 2-cue than on 1-cue trials, F( 1 ,10) 
* 12.1, j> < .01. This reversal was confirmed by a significant Conditions by 
Cues interaction in the joint analysis of the Between and Within conditions, 
F(!,10) * 6.6, £ < .05. w 

The third panel of Figure 1 shows the labeling data for the stimuli used 
in the Between condition, obtained from a different group of subjects. One 
listener perceived all .stimuli as "say 91 and was excluded. The data of the . 
remaining eight listeners confirm that *the standard stimulus (no silence) was 
heard as "say* 1 and that the *say*~*stay* boundary fell between 20-25 msec, as 
expected. The labeling data also exhibit the trading relation between the two 
cues, with fewer "stay 11 responses to the 2-cue (i.e., cqpflicting-cues) 
stimuli. However, this difference once more did not reach significance 
because of high intersubject variability, F(1,7) = 4.0, £ < .10. 

Discussion 

Basically, the results confirmed the predictions: / tiding relation 
between the two cues appeared, though not very reliably, n uhe region of the 
"aay^-^stey* boundary, whereas it was clearly absent within the "stay" 
category. This suggests, in accordance with the findings of Best et al. 
(198D, that the trading relation between silence .duration and F1 onset 
frequency Is phonetic, rather than auditory, in origin. % 

Die present data are somewhat weakened by the nonsignificance of the 
trying relation in the Between condition and in the labeling task. However, 
we must also consider that (1) the difference in the secondary cue was rather 
small and (2) the stimuli were presented in a discrimination paradigm that may 
have facilitated the detection of auditory stimulus differences in the Between 
condition, even more so as this condition -as preceded by the Within 
condition, wlUch required auditory discrimination of similar differences. Any 
phonetic trading relation between the relevant cues (or, rather, its manifes- 
tation as superior performance on t-cue trials) would be weakened by auditory 
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discrimination beyond the detection of phonetic differences, since auditory 
discrimination bestows an advantage on 2-cue trials. Therefore, the critical 
result is the change across conditions in the relation between /l~cue and 2 r cue 
discrimination — a change that^was significant in the present experiment. 

It is: conceivable, of course, that an auditory trading relation between 
silence duration and Fl onset frequency exists when the silence is short but 
not when it long. The iwjst plausible form of this hypothesis wouldl be that 
the presence of a silent interval is more difficult to detect when F1 has a 
her onset, but. that the perceived duration of longer silent intervals is 
affected by F1 onset frequency. This hypothesis is consistent with the 
present data, but it seems unlikely in view of the Best at al. (1981) 
findings. Specifically, these authors found that subjects who perceived sine- 
jrf^ve analogs of "say"- "stay" stimuli nonphone£ically and focused on the 
Silence cue were not at all affected by F1 (-analog) onset frequency, even when 
the silence durations \were in the short range* n 

In the Best et al. study, it was found that listeners who followed an 
auditory strategy focused or one cue and/ignored the other. In the present 
Within condition, selective attention to/ the silence cue would have resulted 
in equal scores on 1-cue $nd 2-cue trials, both declining over blocks, whereas 
selective attention to the spectral* cue would have resulted in much better 
performance on 2-oue than s on 1-cue trials, with no decline in 2-cue discrimi- 
nation performance over blocks. However, no subject exhibited this second 
pattern, and few exhibited the fir?t. Thus, the average data (Fig. 1) are 
fairl: , typical of the individual/ sub ject; they ' are not an artifact of 
averaging over subjects with radically different strategies. It seems likely, 
th^n, that the* present subjects took both cues into account, even though the 
practice trials encouraged Selective attention to the primary cue and 
subjects* reports indicated that they had little awareness of thii (rather 
small) difference in the secondary cue. In that case, the higher scores on 2- 
oue than on 1-cue trials simply sh.w that stimuli differing on two dimensions 
a~<» easier to discriminate than stimuli differing on one dimension only, which 
is perfectly plausible and consistent with the relative auditory independence 
of the two cues shown by Best et al. (1981). Their finding that subjects paid 
selective attention to one or the other cue was probably due to their 
paradigm, an AXB classification- task in which the two cues were perfectly 
correlated in the reference stimuli (A, B) . Thus, their subjects were 
encouraged to select one cue and ignore the other, redundant one; in fact, 
this strategy simplified the subjects' task. The present ' discrimination 
task, on the other hand, while it emphasized ' the silence cue, encouraged 
listeners to pay attention to all possible stimulus differences. The ability 
of subjects to make ds& of both cue dimensions in one task is not inconsistent 
with their ability to select only one N of them in a different task, since 
either strategy may be followed with independent auditory dimensions. 

It should be noted that the advantage of 2-cue over 1-cue trials in the 
Within condition did not increase over blocks (as might be expected if 
subjects began to direct their attention to the secondary cue as the 
difference in the primary cue got smaller) but remained constant at about 0.3 
d 9 , which provides an estimate of the (rather poor) discriminability of the 
secondary-cue difference, assuming that the discriminabilities of the two cues 
were additive. Another feature of the data worth mentioning is the apparent 
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converger. ~e of the i-cue and 2 -cue scores in th* last block of the Between 
condition, although this effect waa not significant, it was quite clearly 
exhibited by several individual subjects. Note that the phonetic trading 
"elation between the cues is expected to disappear not only within the "stay* 
ce'-jgory but also within the •say* category— a situation approximated by the 
V i block of the Between condition. • 



SXPEHIWEiHT 2: "SAY SHOP"- "SAY CHOP* 

» 

The trading relation investigate! in this experiment involved the. same 
primary cue as in Experiment U duration silence, but a different 

secondary cue— the duration of the fricative noise following th* silence. The 
trading relation between these two* cues was demonstrated ^by Repp et ml, 
U978): Hore- silence was needed to turn •say shop* into "say chop* when the 
fricative noise was long than when it was shcrjt. 

This trading relation has much in ^common with that of m Experiment 1j 
however, it does involve two cues varying along the same physical dimension 
(duration) , tfiich makes an auditory interaction perhaps mor^Xiikely than 
between a temporal and a spectral dimension* Fur example^there may be *4 
contrastive effect, such that a long fricative «oiae makes the preceding 
silence sound relatively snort ior vice versa), which woOld lead lf> the 
observed trading relation* The present study put this hypothesis to t*st ? 
using the same paradigm as Experiment i. If there is an auditory interaction 
between the two temporal cues* then it should surface regardless of ttiethfr or 
not subjects perceive phonetic contrasts* 

Method 

Subjects * Ten volunteers participated, two of *fcom had also been 
subjects in Experiment K and six of tfsom had jreviously been subjects m 
Experiment 3b. 

* Stimuli . The stimuli were created on the QVE JIIc synthesizer • Foment 
parameters were copied from a spectrogrms of *saj shop* produced by a male 
speaker (as used in Repp et ml., W8)._ Synthetic^ stimuli were_used tjecause 
it turned out to be difficult to change the duration of a natural fricative 
noise without audible clicks or ocher discontinuities* the initial 2$0-*sec 
"say* portiop was followed by a variable silent interval, a fricative noise of 
variable duration and a 125-msec final periodic portion (*opV whose first 10 
msec overlapped the last 10 msec of *he fricative noise, The fricative noise 
reached maximum maplituue after 50 msec/ Fundamental frequency rose from 85 
to 100 Hz during cne **y* portion and fell from toe to 90 Hz during the *op* 

Th^ primary cue was the amaunt of silence preceding the fricative noise* 
In the Between condition, the standard stimulus n*td no silence at aU ("say 
shop*), and the comparison stimuli had 30, 20, and 10 msec, respectively, on 
"different* trials in the three teat blocks, ^ust as u% Experiment 1. In the 
Within condition, the standard had ^0 memo of silence 4*s*y chop*), and the 
comparison values were 100, 80, and 60 msec. The *say shop* -"say chop* 
boundary was! expected to be in the vicinit? of 20 msec of silence. The 
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secondary cue was the duration of the fricative noise in th* s, *ond syllable. 
On 1-»cue trials, its duration was 110 msec, whereas, on 2-cue trials, it was 
130 msec, thus biasing perception more towards "say shop." The duration of the 
noise was changed at the synthesis stage by extend g its central steady-state 
portion. The stimulus tap*s were recorded directly from the synthesizer, 
without digitization of stimuli, so- the fricative noise waveforms exhibited 
natural random variability across tokens. 

Results „ 

The result* are shown in Figure 2. The first panel shows that the 
average performance level in the Between condition w*s similar *o that in 
Experiment 1 (where the s$me values of silence nad been employed), with a 
similarly striking decline over blocks, F<2,18> = 11.8, £ < ,001. However, 
there was no difference between 1-cue and 2~cue trials; in other words., the 
trading relation did not emerge. 

In the Within condition (second panel of Fig. ?), performance was 
somewhat lower despite the larger differences in the primary cue. Performance 
declined over blocks, F(2,18> = 16. 9, £ * .0<4l, In addition, however, 
accuracy on 2-cue trials was a good deal bet'ter than on 1-cufe UMs/ F(1,9) s 
32.3, £ < .001. This difference seemed to increase over blocKs, but the Cues 
by Blocks interaction did not reach significant There was no significant 
effect involving Repetitions. The jotpt analysis of the Between and Within 
condition revealed a significant Conditions by Cues interaction, F(l,9) - 
22;4 f < .002, which confirmed the different effects that addition of * 
second; r, jue had 4 n the two conditions. 

The labeling results (third panel of fig, 2), obtained from the s«e 
group of subjects * revealed that the standard was always heard as "say shop 11 
and that the phonetic category boundary fell t >tween 20-25 msec, a* expected.' 
However, there was also the expected trading relation, with more "say chop- 
responses to stimuli containing the shorter noise, F(l,9) s 16.9, p < .01. 
Ti.us, the trading relation was exhibited in labeling but not in tetween 
discrimination. 

The reliability of the pattern of results shown in Figure 2 was confined 
by the results of the author and his research assistant who took the .est as 
pilot subje.ts. Both showed the pattern in especially clear form: Ho trading 
relation in the Between condition but a large advantage for 2-c je trials "in 
the Within condition. 

Discussion 

Except for the complete absence of a trading relation in the Between 
condition, the present data are quite slaiJar to those of Experiment i» 
suggesting that the tracing relation betwt«n silence ami fricativt noise 
durations is similar to that between silence duration and Ft onset frequency 
and that both are phonetic in origin. Both, of course, concern the perception 
of the sane phonetio contrast— stop manner. A- in Experiment I, the rritical 
finding is the Conditions by Cues interaction, which reflets the change in 
the difference between 1-cue and 2-cue trials across conditions. The absence 
of a trodmg relation in the Between condition is probably due to listeners' 
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detection of auditory differences in addition to the phonetic contrast, Since 
the difference in the secondary cue j«ss more noticeable here than in 
Eiperiaent 1 (as suggested by the larger difference between '-cue and 2-cue 
trials in the Within condition), the resulting auditor? advantage for 2 -cue 
trials «ay have completely canceled the advantage for i~cue trials due to the 
phonetic trading relation in the Between condition. 

The difference between the t-cue and 2~cue d* functions in the Hi thin 
condition fcuggedts that the diacriiainabiiity of the secondary cue difference 
we« about 0,4 d* at the outset and increased to 0.9 d* iff the last block, 
tf>ere discrimination on i-ey* trials was at chance Althot^h this increase 
did not reach significance, it does suggest that some subjects directed their 
attention towards u*c noise duration difference as the siienoe duration 
difference got smaller* *1he data also suggest, surprisingly, that the 
difference between a i?0-msec and a UO-ewec noise was much easier to detect 
than the difference between a $&-msec and a 60-msec silence (Within condition, 
last block)* Sin^e this finding contradicts Weber's Law* it indicates that 
silence and noise derations are not equivslently represented on the subjective 
temporal dimension. 

An auditory hypothesis compatible with the present data would he tnat * 
detection of silence is not affected b, the duration of a following noise 
segment* while the perceived durat of a logger silence is increased *hen 
the duration of the noise is increased, The direction of this hypothetical 
effect does net *e«s right, but, at present there is no direct evidence against 
this hypothesis > The relevant psychoacoustic experiments remain to be done. 



This study cor-cer^ed with a trading relation reported by Repp H9?9)« 
*ftten voice onset u&e (V0T> is used as the primary cue to .ne voicing of an 
otterrnce~initial stop consonant, less increase in .*DT m needed to turn a 
w-icec stop into a voiceless one when the amplitude of Ve aspiration noise 
.(Whose duration is the TOT) is reduced, This trading relation is differentin 
two important respects from those investigated in Experiments l and 2. Firsts 
the two interacting cik/S are both properties of the same signal portion, viz,, 
of the aspiration hoiae that precedes voicing onset. Second, it appears that 
therr is no gooo articulators rationale for this trading relation. Although 
the relevant measurements have not been done, it seems likely that the 
amplitude of aspiration, measu/ed at a fiied distance from the release, would 
be about the same in voiced and voiceless stops. It is^rue, of course, that 
voiced step* hav* a much shorter period of aspiration, and this necessary 
covariation of aspiration duration and t?oe~mtegrated amplitude may be 
sufficient to account for the perceptual trading relation. Still, the 
*rticul*tory etpianaticn seems less compelling than* that for other effects, 
where different cues can be shown to be acoustically diverge consequences of 
the smae ^articul story act fcf. fiepp et at,, i?78h Moreover, there are well- 
known instances of trade-off a between duration and ampliijde at the auditory 
threshold and m Judgments of loudness te.g* Gamer * NU*er. 19 1 *?; Small, 
Brandt, & Cot, 19621. For these reasons, the present tryii g ^elatioi. may 
well be auditory in origin. If so, it was predicted occur in both 
conditions of Eiperiment 3; that is, performance w expected to be hlgner on 
l~cu* thjn P-<rue trials m Ovta the Between and with;** conditions, 
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Experiment 3 was run twice. The first run (Exp. 3a) was only .partially 
successful bee* .se the Itimuli in the Between condition turned out to have 
missed the boundary (their VOTs were too long) t so that the Between condition 
war effectively another Within condition. Also, the VDT differences were 
rather small, so that the subjects were 'in great trouble. Therefore, a 
replication (Exp. 3b) was conducted with shorter VOT values in the Between 
condition and larger VDT -differences. Results from both' runs will be 
reported. The labeling test was administered at the end of Experiment 3b. 

Method ~ 

Subjects . Eight volunteers participated in Experiment 3a. All of them 
hod previously been subjects in Experiment 1, There were nine subjects in 
Experiment 3b, tvo of whom had also been in Experiment 3a. 

Stimuli . In contrast to the previous stimuli, the present ones were 
modified natural speech. A female speaker recorded the words "goat" and 
"coat." They were digitized at 10 kHz, and a VDT continuum was constructed by 
first replacing the burst and aspiration portions of "goat 11 (22 msec) with the 
first 22 msec of "coat" and by tlien substituting additional equivalent .anounts 
of aspiration noise from "coat" (VOT s 66 msec) for' each successive pitch 
period of "goat." For a detailed description of /Mils procedure, see the 
appendix in Ganong (1980). : 

* Stimuli from this continuum were used in the Between condition only. For 
the Within condition, Where VOTs longer thah that of the natural "coat" were 
required, the stimuli were generated by a different procedure. Note that, in 
the method described above, total stimulus duration remains constant as VOT is 
increased while the periodic stimulus portion^ is progressively shortened. 
This is standard procedure for VOT continue antf probably does not matter when 
relatively short vbTs are to bV discriminated. However, when VOTs are made, 
rather long, little is left of the periodic portion, end informal observations 
have shown that removal of even a single pitch period may become- perceptually 
quite salient. That is, subjects may discriminate such stimuli not on the 
basis of VDT but on the basis of changes in the duration and intonation of the 
"vowel. " To prevent this from happening in the present Within condition, the, 
periodic stiit-ius portion was held constant,, and VOTwas further increased 'by 
duplicating randomly selected/segments of the final portion of the aspiration 
noise, where the formant transitions presumably were close to asymptote. 

^ t 

Thus, the stimuli in the Between condition had a total duration of 228 
msec (VOT plus r riodic portion), with the periodic portion diminishing as°V0T 
increased, whereas the stimuli in the Withirt condition had a constant ^periodic 
portion of 155 msec, and Jtotal > duration increased with VOT.* All stimuli 
included, 4 n addition, a rather powerful final tt] release burst of approxi- 
mately 112 nsec duration, which was separated from the end of the periodic 
portion by a 133-msec silent closure interval. i 

The prima* y cue in this study was, of course, VOT (i.e., the duration of 
the aperiodic portion a f stimulus onset). In the Between condition of 
Experiment 3b (that of Experiment 3a will not concern us here, since 
performance was at chance), the standard had a VOT of 38 msec (which seems 
rather long but was still heard as "goat"), and the comparison stimuli had 
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VOTs of 55. 49, and 44 msec, respectively. In the Within condition of 
Experiment 3b, the standard had a VOT of 73 msec ("coat"), and the comparison 
stimuli had values of '108, 98 t and 85 msec, respectively. In Experiment 3a t 
the same standard wrfs used, but the comparison stimuli had values of 98, 91, 
and 85 msec. The secondary cue was the amplitude of the aperiodic stimulus 
portion* On 2-cue trials, it was reduced by 6 dB SPL in the comparison 
stimulus, counteracting the longer VOT of that stimulus. This manipulation 
was performed on the digitized waveform, using computer instructions. 

Results 

Within-catefcory discrimination of the *goat*-*cpat" stimuli proved to be 
a difficult task for naive subjects. One problem seemed to be to discover the 
dimension on t*ich the stimuli differed. (Recall that the nature of the 
difference was not revealed in the instructions but had to be detected during 
the practice block.) In Experiment jja, performance ift the first presentation 
of the Within condition was close to chance (average d* * 0.31), and there was 
no difference between 1-cue and 2-cue t-ifls. A similar result «was obtained ' 
in the Between condition where, because of inappropriately long VOTs, all but 
one subject heard only "coat" and performed at chance level. Th^ single 
subject rtio appeared to be able to make ise of phoneUc contrasts performed 
quite well and had higher scores on 1-cue tnan on 2-cue trials* in accpg4 with 
the expected trading relation. Prompted by subjects 1 complaints over the 
difficulty of the task, the experimenter told them before the repetition of 
the Within condition what kind of difference to listen for, and he produced 
exaggerated examples of stops with different amoun»»-*Qf aspiration to ilid(- 
trate ^he point. This had a'striking effect on (most) subjects 1 performance. 
The results from this fisal condition of Experiment 3a pre presented in the 
second panel of Figure 3 (the functions labeled "a"). It can be seen that 
performance was better on 1-cue than on 2-cue trials, F(1,7) s 5.7, p < .05. 
This pftttern. cohtrasts wUfe that obtained in the ■ Within conditions of 
Experiments 1 arft*?, *#iere the opposite. difference was observed. Due to large 
variability, neither the decline in performance across blocks nor the Blocks 
by Cues interaction reached significance. 

The subjects in Expert men t jU> were told right at the outset to direct 
their attention to the inftial portion, of the stimuli; however, ;hey were not 
told the precise nature of the difference to listen for. Surprisingly, the 
hint did not help. Performance in the first Within conditio was poor, 
despite the increased VOT differences (average d f * 0.23). and there was no 
clear difference between 1~cue and 2 ^cue trials* Therefore, these data were 
again discarded. However, the choio* of VOT values for the Between condition 
was more successful this time; these results ar e shown in the first paflfel of 
Figure 3. Sub fecta performed at a leval comparable to that ^n Experiments 1 
and 2, although the durational differences were somewhat smaller here. 
Performance declined over blocks. F(2,16) * 5,6, £ < ,05. Scores were higher 
on 1-cue than on 2-cue trials, F(1,8) z £ < .01, which inflects the 

expected trading relation, 

• * 

Ihe results of the repetition of the Within condition are shown in U 
second panel 5T figure 3 (labeled *t>*).' These subjects, too, were told «-.at 
difference to listen for before they repeated the Within condition. However, 
their performance improved ,less than that of the subjects in Experiment 3a. 
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Although tatter than chance, on ttie average, scores were low and highly 
variable. Neither the Slocks effect nor the Cues effect was significant; 
note, however, a^ tendency for 1-cue discrimination to be higher than 2-cue 
discrimination. This tendency is supported not only by the results of 
Experiment 3a but also by the data of a research assistant who served as a 
pilot subject and showed a striking advantage for 1-cue trials in both 
conditions. The Cues by Conditions interaction was nonsignificant. 

The third panel of Figure 3 shows labeling data deriving from six of the 
^•jjbjects plus the research assistant. (Three subjects had already been tested 
before it was decided to add the labeling test.) T&eae data confirm that the 
standard stimulus (VOT * 38 asec) was perceived as "goat," and they also show 
the expected trading ralatlon, although it fell short of significance, FO,6) 
«5.!»£< .10. 

* 

Discussion 

The reauUa of this experiment ire stronger in terms of what they do not 
sfto* than in what tl.ey do show. The most sign! ri dint finding is the a bsence 
of an edvsntag* for 2-cue trials in the Wthin condition* The data suggest 
tfcat, on the contrary* there was an advantage for i-ci* trials in both the 
Wthlsr and Between conditions. This pattern of results is the one expected 
for a tradfnf relation of paychoacoustic origin. The interaction between 
aspiration noise d^rati^c and maolitude may be similar to other kinds of 
^audltorv time-inlensity trade-offs. 

£t»E* :HEHT >*i «CH0P»"«3H0F" 

Tne trading relation atwiled In this last experiment has been known for a 
iopg time; It concerns fricative noise dureticn and rise-time (i.e.. the tine 
from noia* onset to the point cf maxim** solitude) •* Joint cues to the 
fricative-affricate distinction. Gerstman (1957) showed that, to turn an 
utterance-initial tj) into * U5l, tr>e r*oiae duration needa tc be shortened 
•ore if Us rise-time is slow; or, conversely, its rfse-tiae Rust be shortened 
mo*r* if noise duration iu long, Gerstman excluded the rise-Uce portion from 
his measure of noise 'Juration, thus confounding total noise duration with the 
rise-Use eri*ble. Van Heuver, tW/9) recently re*naly?ed Geratman'a data and 
found that total wise duration accounted fcr nearly all the variance; rise- 
time made only a mall contribution tc perception. Stili* it can hardly be 
doubted that amplitude rise-time hm some cue v^lue for the fricative- 
affricate distlrjUon. Although some relevant studies, have ccniounded rise- 
time with »plitude at onset* Which itself may be an important cue (e.g., 
Donsan. Raphael * * Llfeerman f t9?9; &tp* 53, others h*v* showh rise-time 
proper to be a sufficient cue U,g*> Cktung & fcoaner* 191%t Ro<*en & Howell. 
*9B- i, lhas, it ae*^a *ifc«iy that rise-time can be traded gainst noise 
duration* at least srtthio certain limits* 

Uke the trading relation ir#vestig&ted in Experiment m U that between the 
present two cues 'engage two properties of t^e signal portion. It is 

possible that these properties interact at tne auditory level ta determine the 
perceived duration of the noise, or <<?saib3y i*s Received abruptness of 
on3§t„ However, the pre^fcJU JZ'*Ax&^z*l4XUw^ ^44^e ^^44peri«ent 3. 

also ha* * good zrUcvittory explanation; JfeturaHy prc4uee$ fricatives and 
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affricate* differ in both noise duration and rise-time. Experiment 4 was 
expected* «o shed light on the origin of this trading relation. 

Experiment M actually consisted of two experiments, identical except for 
the stimuli. In Experiment «e, the full "chop"~"ahop" stimuli were used* In 
Experiment **b, only the fricative noise portions were presented. This second 
experiment was Intended to serve as a kind of nonspeech control for the first, 
since Informal observations had suggested that the isolated fricative noises 
did not invite phonetic categorization as *sh* or "oh." or in any case were 
more difficult to label than the full stimuli . It wai expected that whatever 
phonetic effects might be present in the Between condition of Experiment Ma 
would be absent in the corresponding condition of Experiment **b. 

Method 



Subjects , Nine volunteers participated, five of whom had also been 
subjects in earlier experiments. All subjeots took Experiment **e first* then 
Experiment Mb on a separate day. 

Stimuli/ The stimuli w***e created on the OVE IIIc synthesizer; they were 
derived from the second halves of the stimuli of Experiment 2. The choice of 
cue values for the Between condition was guided by Geratman's (1957) data* 
primary cue was fricative noise duration • In the Between condition, the 
duration was 70 msec for the standard (intended to be heard as "chop") and 
1<K> # 90, and 80 msec, respectively, for the comparison stimuli . In the Within 
condition, tne standard had a I40~maec noise (*ahop*>, and the comparison 
values were 200, 160, and 160 msec./ She secondary cue was the rise-time of 
the noise, (to 1-cue trials, it was 60 msec; on 2 -cue trials, it was reduced 
to 30 msec (fmporlng "chop" percepts). In each case, the mplitude rise was 
linear and onset amplitude w*s set *t the minimum value possible it synthesis; 
amplitude parameter values for the two different rise* times Man to diverge 
after the initial 5 msec. The accuracy of the rise-times was verified by 
digitizing and displaying the waveforms of the stimuli. Stimulus tapes were 
recorded directly from the synthesizer to avoid artifacts due to "frozen" 
noise waveforms, 

jtesults 

The results of Experiment Ma are the functions . labeled "a" in Figure 
Performance in the Between condition was again comparable to that in previous 
experiments; the decline over blocks was significant, F(2,16) * U.0, < 
.001. However, there was rc difference between 1-cu* and 2 -cue trials, k 
slight avantage for l-cue triafa at the outset changed to a slight advantage 
for 2-cue trials in the Unt blook, but the Blocks by Cues interaction was not 
significant. 

*• 

Surprisingly, the results of the Within condition were rcmlrkably similar 
to those of the Between condition. There was no significant effect involving 
the Repetitions factor. Performance declined over blocks, F(2 $? 16) s 26.5, j£ '* 
.001, and an advantage for 2-cje trials emerged in the second and third 
blocks. The Blocks by Cu.-s interaction reached significance here, F(2,!6) * 
*i.2, £ < .05* This interaction was also obtained in the Joint" analysis of the 
Between and Within conditions. F(2,16) « 7.4, £ < .01. with nc triple 
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interaction involving Conditions, which confirms the similarity of the re- 
sponse patterns in the two condition * . 

The labeling data wert less tidy than in the earlier experiments; in 
particular, the standard stimulus was not an unequivocal "chop* for all 
listeners. However, the trading relation between the noise duration and rise- 
time cues was present and significant, F(1,8) « 21.0, £ < .01. 

The results of Experiment 4 b (fricative noise portions only) are labeled 
"b" in Figurfe 4. performance was strikingly better here than iti Experiment 
a. A so, in contrast to Experiment 4a, a large advantage for 2-cue trials 
can be seen, both In the Between condition, F(1,8) « 19.7, P < .01, and in the 
Within condition, F(1,8) «• 47.9, £ < .001. The results had in common with 
those of Experiment 4a the Blocks 'by Cues interaction: The advantage for 2- 
cue trials increased over blocks, particularly in the Between condition, 
F(2,16) * 11.3, £ < .001. The interaction did not reach significance in the 
Within condition, where it may have been due to * ceiling effect in Block 1. 
The different patterning of this interaction in the two conditions was 
reflected in a significant Conditions by Blocks by Cues interaction, F(2, 16) s 
6.4, £ < .01. There was no effect involving Repetitions in the Within 
condition. 

Discussion 

The "chop"- "shop" stimuli were the most problematic ones of th> present 
set. Mot only was the phonetic contrast less ulear-cut, but the author also 
noted as a pilot subject that the stimuli were prone to auditory segregation: 
After some minutes of listening, the fricative noise would suddenly "stream 
away" from the periodic, portion, thereby destroying the speechli ken ess and 
perceptual coherence of the stimuli. These observations are' in accord with 
the results, which show little difference between the Between and Within 
conditions, suggesting that listeners may have made little or no use of 
phonetic labels ic Between discrimination. The Blocks by Cues interaction may 
indicate that subjects made some use of phonetic labels in the first block of 
both conditions and abandoned this strategy later. This is not implausible in 
vi«t of the possibility that the standard stimulus in the Within condition may 
not have been ant unequivocal "shop"; it is also supported by the reports of 
some subjects tfio claimed to have heard a [J]-ttJ] contrast in the Within 
condition. However, this interpretation is called into question by the 
existence of a similar Blocks by Cues interaction in Experiment 4b, where 
phonetic labeling presumably played no role. We may presume, then, that the 
interaction reflect* a change in auditory strategies: As long as differences 
in noise duration were large, listeners paid attention to that cue dimension, 
and onky as the differences got smaller was their attention directed to the 
rise-time differences as well. 

TWo aspects of the present res u/fc »' a re clear. First, fricative noise 
duration and rise-time do not seem to engage in an auditory trading relation; 
otherwise, an] advantage for 1-cue trials should hive been observed in the 
Within condition, just as, in Experiment 3. Therefore, the trading relation 
observed in ,tHe labeling task is likely to be phonetic in nature, and its 
failure to_ shQ4t_upj:ln„Jifttween discrimination may be ascribed to procedural 
factors and to the above* mentioned stimulus problems. Second, the periodic 
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portion of the "chop*-*shop* stimuli seemed to interfere with auditory memory 
for the durst on of the fricative noise, or with the perception of that 
duration in the first place; Discrimination was considerably easier when the 
noises were presente<LJn isolation. Perhaps , this difference reflects differ- 
entl y-sl zed auditory Sail ts; it sight disappear unen the noise U perceptually 
segregated frt» the periodic portion, either as the consequence of prolonged 
listening or of s listener-controlled stre**gy. However, Repp (1981a) found 
that isolated fricative noises differing in spectrin (rather than duration) 
were more accurately discriminated in isolation than when followed by a 
periodic portion, even by subjects who were abfe to perceptually segregate the 
noise iVom Jthe periodic portion/ Thus, even though the stimulus components 
could be isolated by perceptual strategies, they were not completely indepen- 
dent in auditory memory. 



GENERAL' DISCUSSION 

Even though the present results must be considered preliminary, they are 
encouraging, end the technique used premises to provide a relative effortless 
way of determining the origin of a trading relation. The postexperiaental 
labeling tests showed the expected trading relations in all cases (although it 
was not statistically reliable in two). Thus, the stimuli seemed appropriate, 
even though they had not been formally pretested. However, the expected 
trading relations were not consistently present in the Between discrlmlnatidn 
condltldns. In two studies (*say"-*stay s * *goat*-*coat") , they showed up, but 
ifbt very reliably; in the other two ("say shop*-* say chop,* *chop*-*shop*), 
fhey were definitely absent.* The proposed reason for this was that the fixed- 
standard AX paradigm encouraged listeners to, make maximal -use of whatever 
auditory differences they could detect between the stimuli . For example, 
Carney 9 Wldin, and Vlemeister (1977) and Ganong (1977) successfully used the 
same paradigm to get subjects to discriminate small differences in VOT within 
a phonetic category. Auditory discrimination, m addition to discrimination 
based on phonetic labels, would tend to reduce the trading relation observed 
in the Between condition, unless the trading relatiori itself is of auditory 
origin. It alSp^Seems that differences in fricative oise duration were 
relatively sali*t, whi^h may explain the absence of an advantage Xor 1-cue 
trials in the Between conditions for both "say shop*-*say chop* and "chop*- 
"shop." 

The critical data came from the Wthin conditions of the different 
experiments. In two studies ("say"-tstay," "say ahop*-"aay u*op*K there was 
an advantage for 2-cue trials, which contrasted with the pattern of results in 
the Between condition. This outcome suggests strongly tHfet the trading 
relations between the relevant cues are phonetic in origin, confirming earlier 
results by Best et si. (1981) for the "aay"-"atay* contrast. These trading 
relations — between silent closure duration and Fi onset frequency in the c*$e 
of "say"-"stay," and between silent closure duration and fricative noise 
duration in the case of "say shop"~"say chop"— are well explained by reference 
to articulation, since in each case changes in the to cues are tightly 
correlated in the production of the reliant* phonetic contrast. In a third 
study ("chop"-"ahop") t the results were mole ambiguous because similar results 
were obtained in the Between and Vithli^6nditions, and the advantage for 2~ 
cue trials was not as clear-cut* However, since a clear trading relation «as 
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% obtained in the If be line teak, the trading relation is likely to be of 

phonetic origin. The articulator* rationale applies here, too: Both frica- 
tive noise duration and riae~time change together in the production of the 
frioatlve-affrloete contrast* Thus, three of the trading relations investi- 
gated appear to be phonetic in nature, and each of the* has an articulatory 
, explanation. 

Only the "goat*-"coet* atlnull yielded a different pattern. Her*, there 
» 1 was an advantage for 1-cue trials in both the Between and Within conditions, 
suggesting an auditory origin for this trading relation* Significantly, this 
trading relation is also the only one that has no obvious articulatory 
correlate!: Aspiration aaplitude jer as does not seen to vary in the voicing 
contrast tor stop consonant*. Thu», the present results fit the predicted 
pattern: A trading relation is phonetic in origin if it has articulatory 
correlatea, but auditory in origin if it does not. 

the results of the Within conditions also tell us something about the 
auditory perception of speech parameters. In some case* ("say*-"stay," "say 
shop"- "say chop," isolated noises of "chop"- "shop"), the two cue dimensions 
seemed to be independent and simultaneously accessible to the subjects. In 
the case of r goat*-*coat," on* the other hand, 'they seemed to interact. This 
difference la reminiscent of the distinction ^between "separable" m4 "integ- 
ral" atimulua dimensions (Garner, 197*; Lockheed, 1JJ7G). 'Integral dimensions 
are those tftere, in order tor one dimension to exist, the other sust be 
specified, end where "selective attention *to one dimension alone is not 
poseiblf (Gamer, 1974), Aspiration noise durstlon and amplitude seem to fit 
that description. However, the pairs of cues involved in the "say"- "stay" and 
"say sbop*-*aay chop" distinctions do not? they seem to beibe separable at the 
auditory level, perhaps because they are also separated in\time. t In order to 
prove their eudHor y separability, it would be necessary to^ow that they can 
be selectively attended to, as Beat et al. <l98t)*heve don* tor "say"-"atay." 
the present task did not require selective attention, although it permitted 
audi a strategy; the subjects, however* seemed to pay attention to both cue 
dimension*, which is certainly an option with sepsrable cues. It is not clear 
-v^jrfhere the "chop"~"shop* raaults stsnd in that regard; they are the ones most 
in need of replication. 



Even though two cues may be auditorily separable, it is significant that 
they can neverthele, a be integrated into a single phonetic percept. 
Presumably, thla la achieved by a higher- level, spceoh~sp*cific process that 
combines cuta according to implicit knowledge about the articulatory and/or 
acouatlo patterns of speech. It is not neccmsary to envisipn this process as 
one of cue extraction followed by cue recombination according to certain rules 
(the traditional machine metaphor); mare vaguely, but probably nore appropri- 
ately, it My be understood aa a consequence, of perceiving articulatory change 
through the acouatio signal, and oT referring the perdeived changes to 
Internal criteria that specify the phone vie categories of the language. If 
so, it seems likely that attempts to explain phonetic tracing relations by 
auditory paychophyslcs will, in most oases, remain futile. 
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PRODUCT 10* AID PERCEPT 10* Of PHG*£TIC CONTRAST OMMK PHO*ETIC CHMMGi* 
Paul J. Cost** end Ignatius G. Kattingly 



Abatrsct . Ten productions of each of tt» two words cod and card la 
tbeebutheaeterh eubdialect of the Eastern Mow England dialcotTsaid 
to differ phonetically only An «om1 length (and* a number of foil 
words involving other pbooetio contrasts) wsre recorded in a neutrsl 
oerrier sentence by each of nine phonetically naive urban-Eaatero 
Maw England speakers unawa r e of tba purpose of the investigation. 
Speotrographlo measurement* revealed fairly consistent dlffereooes 
in the vooalio sagnsnt durations of cod and oard for aost speakers, 
ftj'v no speaker oould reliably identify hi a own intended production* 
, (though identification of foil a was perfect), evidently a phonetic 

^ f change is in progr*»a, and our result* suggest th/.t during such a 

change, contrasts in production say persist after they nave ceased 
to be percept ally relevant . 



/ 



** *e usually taken for granted la phonetics that given a regular 
alternation In toe prodftetioo of two diet loot lesloal iteas, these two items 
ulll be pareeived as different. Labov, Taeger. and Steiner (1972), however, 
have reported Instances of *rertlal mergers," in which, despite oonalatent 
euouetic differences between two- phonetic types in a dialect, speakers of the 
dialect failed informal commute t loo teats. The purpose of this study la to 
— *"* """ h ,<t "' Hn " «n tftlofr t h ft '^W sW aaaasotloa seems to be 
contradicted. The case in point ia taken free the southeastern subdialect of 
the Eastern saw England dialect— henceforth SOBS-. apok*o in and around Fall 
liver, Massachusetts. 

* « 

It has been stated by Thomas 095*) pad t»y Kaoyon (1937). on the basis of 
data. collected in the 30's for the Linguistic Atlaa of Mew England, that in 
the case of low vowala, vowel length ia distinctive for SHE. For « (aspic, 
there ia said to "be- only a vowel length distiootior between the two words coo 
tend.} and card Ckocdl. This implies that the acoustic signals for such pairs 
may differ solely in the duration of the vcoalic eegmect. We have found, 
however, that while there is a fafrly reliable dsratlonal difference in the 
■^-pnoduotloo of ood and oard . speakers of this dialect* cannot consistently label 

thiic- own product loos. In other words, the distinction in production is 

virtually Ignored in perception. 
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He recently oitw out a production eiperlment a perception eiperi- 
ment with nine S£*E speakers* to the production experiment, *e collected 
acoustic flat* fro* the informants with *iich tA« acoustic correlates of the 
vowti length distinction between cod end wrd ooald tm measured. The 
Mteriele for each production experiment consisted of five me of t*o 
minimally paired common English word* differing with reepect tc a single 
feature: goaUcoat, gratSHEfde , splt-spin , bit-pit , and eed-card * The first 
four pairs acted as foils. Each word vis put into one of three carrier 
sentences* so that it *ould be spoken at a natural tempo* A list In *hich all 
sentences appeared ten Uses, in random order, was prepared, The subjeoOs 
task leas to speak each* sentence at a normal tempo* These utterances were 
recorded * 

In a subsequent meeting with each subject, nhich took place tr<m one hour 
to two days after the product loo experiment, a perception test was given, The 
stimuli for each subject vera the 100 sentences he had spoken in the 
production experiment, Word pairs other than card and *od reMined as foils. 
Each subject was asked to write down the test word in each sentence. 

Hide band spectrograms were made of the ten tokens of cod and the ten 
tokens of car<* as spoken by each subject. Three successive durational 
measurements were noted: U t&s voice onset time for Ik], 2) tne vocelic 
duration measured from voice onset to tHe td] closure, *nd 3) the closure 
duration for tdh For each speaker the Wt and closure duration varied from 
token to token without a consistent fatttra* fjo the other h«d, the time 
measurements far the vocalic duretio med a rattier consistent pattern. 
Keaeuraments of vocalic duration plus >r closure duration, or hoth, 

less consistent than localic durst ici Vocalic dur ,tion averages f - 

the ten soemkers for cod ranged i>am * * 320 msec, *hUe averages for card 
ranged from 2*0 to *00 msec, The dif -,tet in the speaker average ranged 
from 30 to *G msec, In all, three £ objects made a definite split in their 
productions, four subjects were moderately consistent, and two were very 
inconsistent* 

In order to pool the data in a my that would exclude f *a far as 
possible, intersubject variation in speaking rate* we represented U\t vocalic 
duration of e#cfc tcken in signed uaits of standard deviation, using the 
average of each suhjeot f s durations for both cod and card as the mean for that 
subject* thus* if each subject h*d produced all his tokens of card wit* 
longer durations than any of his tokens of cod* all card tokens would have 
greater signed values of steward deviation than an* cod token. 

Figure ! shows the data pooled in this way* TTje number of cod 
productions for a particular range of wderd deviation values is plotted as 
a histogram above the horizontal a* la. In the asm* way* card productions ere 
plotted below the horizontal axis* While there is o substantial overlap* it 
$s clear that thfe proportion of cod productions 6%^t^bb%&^ and the proportion 
^ S^Tj productions increases, as the standard deviation goes from extreme 
legative values {corresponding to relatively short durations) at the* left* to 
?*treme positive values (corresponding to relatively long durations) at the 
fright, Thufi the production 4sta are consistent with the vowel length 
distinction described by Thomas and by Kenyon;^ the two words do differ in 
I vocalic duration in production. * 
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ferception is • different iiutr, Individual labeling results break the 
* subjects* <k»w into three groups; two spesfcers with relatively consistent 
percjptloo; four with inconsistent perception, and tars* with en overwbeinmg 
response 01 as towards one target or tne otbar. 

Figure i also shows the pooled labeling data, correct responses being 
indicated by the darkened portion of each hiaiograa and errors by the white 
portioQ. » It ia obvious that subjects ara identifying tha intended product ions 
at ohaooe level: chay cannot distinguish cod froa card * 

To d^terama whether -subject's Judgaents ware influenced by vocalic 
duration , regardless of what taey had intended aa abetters t we r, i -plotted tha 
ease data according to perceptual judgments. In Figure 2 cod judgaento ara 
plotted above the borixootal axis and card judgaeota below. If duration baa 
influenced these judgments, the proportion qf cod responses would have 
decreaeed, and the proportion of card Judgments would have increased fron left 
to right with increasing values of standard deviation. But no such correla- 
tion tppt^rs,. Not even the positive, and negative extreaes of standard 
deviation are consistently labeled, Thus we have evidence tiiat a distinction 
reliably aede in production has no effect upon perception. 

A possible etplanaticn for this curious state of affairs is that, since 
the thirties, when the data were gathered on which lanyon's and Theses' 
diacriptlona were tased, long and short *V have begun to serge in this 
dialect. If such a linguistic change were in progreaa. we sight indeed expect 
to find that habits of production persisted after a d'sUnction had ceased to 
have any linguistic significance. This would sean sereiy that speakers ware 
wasting effort in distinguishing words that had effectively becone hoaopbones. 
bote that the converse possibility— a linguistic distinction samtained in 
perception but unsupported in produetionUie unliteeifT since it would result 
m aisunderatandingg. 

»• 

The descriptions of Thaw. as and Ieayon\ however , ara based on inpreasi- 
onlstic data, and wa cannot be certain that the paroeptuai distinction existed 
even when the dialect was described, If it ^id not. than wa cannot conclude 
that a change ia i»i progress. \ 

But whether a change is in progress or not, there la another way to 
interpret this pheheeeooo. The pronunciations of such words as ejid and card 
say function to aark a dialectal rather than a lexical difference. It would 
^e interesting tc deteraine whether subjects could sake s dialect judgaent on 
the baais of these words if. and only if. they knew what lexical iteas wire 
intended we intend to pursue this question further. 
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VMS OF aJDDXTQftt ICMOtt IS 70WZL BISCfelXIIttTlCW* 
Robert G. C roster* * 



Abstract. Him caper tMots on oeserdiffereot towel dlecrlalMtioa 
*re reported. In «*cb tb« Mia variable mo tb« doratloa of « ellMt 
delay between cm two itw being jedged. as would be expected fro* 
the sseeaptloa that web JadgMet* depcad at lcMt partly oa 
aodltory sensory aaaory, it Ma fomd teat longer deleyo lad to 
poortr disc rial aatloo cbaa shorter delay*. The eaditory aaaory 1ms 
mom to be asyeptotic at eboat three seconds, whether it ie 
Mwared by correct dleerlalMtton or (as la one part of the eeeoad 
experlaaat) by the contextual taflueace of the* first vowel 00 
Identification of the sound. 



nrrtootcrioi 

For all the work mm of u* mm doM cm aodltory teoaory aaaory, we koov 
Mry Utile eboat ite Um coarse. What evldeace there ia com* either froa 
scattsred reports asieg totally ooacoaparsble Mthods or froa esperlMOtal 
techtdoM* that ere oot ideal for eddrMSlng the decay eaestloo. Still, m** 
exporte would probably aare* that echoic Mtaory com oot reMla ewailable 
foroMr and that it decays slower than iconic aaaory. there haws mm two 
resMrch prograM that soaght data relewaat to the decay question, both aalng 
a for* of sashing to aacoMr properties of auditory atmoryt 

Xa Msearo's espcrlMate related tM this topic, for eaaple (see Haascre, 
1970), a single tone selected froe two posatblUtl** is presented for • 
recoaaltlM rMpOwsa (high /low). The pheeoaeaoo of inter eat ia that m 
unrelated eastting com presMted jost after the teat •tlaulos lapel r* correct 
res poodles ia a way that depeade oe the lateral between target sad mm. If 
the Malt is delayed by about 290 mm, the rMpOMc ia eaisoaired, hat awr* 
1 Mediate msks redoes perforaaoce considerably. It la the daaage doM by the 
mm that hie led Heeearo to Infer the existence of aodltory mm- froa this 
deaoMtraUM. la the stlMlas saf fix effect, discovered by Del uitt (1963) 
sad elaborated by Crowd* r sad Mortoo (1969), the target atatry trace is s 
hypothMlsed package of sooas laforMtloa eboat the l*«t ItM la a mm ry -s paa 
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typ* list, ycrforaeac* on cms 1a* t itea U bedlr d*a#**d by an extra word, 
to* etlaulua suffix, presented m if it were the oe*t it* in the lite, The 
suffix eaa be eaaeotically unrelated to th* rest of th* list and osed not ,be 
recalled. Again, eudltory storage U inferred froa the vulnerability of th* 
target (leet acaory lets io the lite) to' aa eking fro* th* tuff is. 

i 

• Is both too recognition as eking of ton** aad the stiaulu* 
paradlgae, increasing th* inter**! b*tw**n th* target and th* eaab lead* to 
laproved perforneoce, ap to a poiat, Hmmm (IS 70) found this, iapeovaaaot 
roacbod aeyaaCoto at arooad 2)0 a**c, aad Crowd* r (1969) found Chat a suffix 
delayed by aore than about 2 aocooda had ao af facr. oa parfonaaaca. Both eh*** 
esysptotas war* scad a* cstlaetee of th* duration of auditory aaaory (Crovdar, 
1969, oaf* 261; Kuaaro, 1972, pat* 129). Th* reasoning wae that ataxia* 
would beeoae ineffective whan th* target inioraatiou io th* aaaaory ator* bad 
dacayad. Although tola cUia by itaalf it quit* true, it ia invalid to 
coeelada anything «boot dacay froa the tia* at which aeakJng bacoaaa ao longer 
effective; Vbaa the aaak ia d*lay*d, ia th*c« paradigm* , th* sensory ttfr* 
fight raaai* iataet bat aunbtlt th* *ubj*ct baa had tha opportunity to 
tecod* th? iaforaatioa la tt$ if tha *abj*ct haa b*es> able to iacorporata th* 
Iaforaatioa cootalaod ia th* aeuoory tr*c« to aaaa for* panaa&aat forest, th*a 
it ***** ao difference whether th* a*ak doaa or doaa oot d*«troy thi* 
inloraatloe iatar. 

Vstklu* aad Todr** (1900$ •«« also tfetkln* * Wetkisa, 1.980, Exaerlaeat 6) 
ha** r*c*atly reported *«**ral •*p*rla*ots oa d*l*y*d eaffixea. They offered 
evidence that th* interval bafor* a delayed oaf fix woe iadaad being oa*4 by 
eubjecta for readout of auditory iaforaatioa ioto sou* aore p*raaa«at fora of 
aaaory. They also coaiiraad CrowdAr"* (1971, pat* 339) speculation that dacay 
sight b* vary aueh slower than originally conjectured t*y Crovder and torwoa.l 
Vatklo* aad Todr** ha** correctly observed that whereee the abseace of a 
suffix effect after aoa* delay *aya nothing about whether th* auditory t*r*c* 
survives that long, the praaaace of a awf fix effect at aoa* delay do** euggeat 
th* survive! of tha trace for at Seaat that long. They found that if they 
prevented subjects froa engaging ia readout of th* target iaforaatioa dariag 
the delay (between target aad suffix), *n appreclabl* suffi* effect *aa 
obtained after 20 **conda. 

Although th* su/fis exparieeat **y thus be forced to yield acceptable 
inferaacaa about ainlaal survival tia* of auditory aaaory, it i* not id**l for 
thio purpoae. The portion of perforaaac* la the auffix experlaent that ia 
iatcreetlng for the aaalyala of auditory aaaory — relative performance oa the 
laet aerial position — ia eupor;*poeed oa a background of highly coaplicated 
aad stretegy-proee sbort-tera aaaory function*. For exaapie, to daaonstrat* 
th* 20-eecoad«delayed suffix experiaeet, Wetkina and Todr** had to *ng*f* th* 
•object* in e lively acntal arithaatic task betweea th* last aaaory lt*a and 
the gufflx itea. We know chat *v*o ao aundaa* a teak aa rcaaabaring a seriea 
of aaaalaglaaa itea* in order aagag** eeveral types of aeehauieae — grouping; 
coaulativ* rehearaal, effort* at aeaaetic coding, artfculatory loope, and so 
oa — ■ and aaay of the** aacbauieae are quite likely to Interact with aerial 
position. Accordingly, It would b* a boon to b* able to atudy auditory aaaory 
aad it* dacay properties la th* context of a *iapl*r task. That ia th* 
purpoae of Che reeeereb reported in ihie paper. 



PI tool (1973) baa uaad • aaaa~dlffaraat apaaeh dl acrlaioatioo caak to 
a tody tilt dtcay of aadltory mmatf (aaa tho Happ t Baaly* 4 Crowdar, 1979; tha 
background for tba** io**«ttf»tiooa la e<mrt4 la Crowdar, to ^w). In thla 
tank, tha (object haara two • pooch ooatkU — parhapa voeolo similar to tba /£/ 
ao4 /I/ lo KIT aad SIT* Tba Mo aow&d* ara typically qui to eloaa to aach 
othar acoaatlcally, oo that porhapa tbay both aottod Ilka ooa or tbo ot bar of 
tbeaa two phoaatlc aagaaau* tha tobjact mt dacida wbatbtr tba twr ara 
Idaotlcal phfalcaUy or curt. tap* el ally la tba eaaa afcara bctb It aaa aorod 
ilka oaly om of lb* poaalbla phoaatlc aagaaata, tba raaa o a taf la that 
auditory mmoty mat turn aoaa rola la correct parfonaf*ca# Coos Ida r tba 
aabjact raealrlas tba lacood of tba two it mm to ba jadfadi If tba aaoood Itan 
baa tba aaa* oaaa (pboaatlct labal)«aa tba flrat, tbaa tba oaly way tba aobjaet 
am tall wbatbar tbay an physically tdaotital la by raaaabario* tba aouod of 
tba flrat us til tht aacoad arriwaa* 

• / " 

Ptaoal (1973) sat ttw dalay batvaaa tba two fowal atlmli at latarvala 
f row oaaHhalf to two aacoada. Ba fooad that parf onaac* waa poo tar at tba 
loogar aaparatio&i, aa wold ba txpactad If tba aoead of tba flrat lta« — lta 
aadltory aaaory traca — mtm dacaylag dor lag tha latanral hatwaaa •tlauli. 
tV logic of laolatlag aaaaory mwry eoatribotlooa through aaolptOatloa of 
dalay la a ooccasal** dlacrlalaatlco taak f la aot at all aaconraatloftal. 
Elarhla (1973) obaartwa that aach a taak prorldat %.#a rat bar dlract approach 
to 'aaaaory wmorj" proeaa*aa*«*~ la hi a axparlatat, •ttfcjtcta baard a 
cowpoaad toot aad tbaa, af tar a rariabla dalay, bad to aaka a flow lasaatlt* 
dlacrtalaatioa batwaw a ataxia proba ton aad lta corra#p*adlag *Ummt la 
tba orlgfoal coapoaa^; parf oraaaea attadily dacrtaaad * f tm toayoaad proba 
latarvala of oa*"h*lf to two aacooda* laaaoa (1977) fooad poortr parfonaaca 
la a *p%at*al watch* (aawa/dlf farm) taak with lataratlaal&a totarvala of 
$70 «aac tbaa ISO aaac, aalag at o p ■ Vo w a l C9 ayl labial • 

Tba ptfcaaot arparlwaata wtra plaaaad to taat othar latarvala tbaa tboaa 
Naomi oaad, la ordar to fit m aatlaata of dacay rata la auditory oaaorr 
Tbla raaaarth cannot aattla whathar tba aadltbry aaaory balloted to aapport 
aaaa-dlffaraat • poach dtacriidaatloa la tba aawr awdltory aa*>*y that baa baao 
atwdlad la tba aaffla axparlwaatV (Fracatagorlcal Acoaatlc Stortga) • That 
qoaatloa ■ poada a dlffaraat klad of avparlaaot « Boearar^ tba #aMrdlffaraat 
dlMrladaatioa tank la obrlooaly a wort dltwet aad ataxia c^otaxl la which to 
waaaora ) tba auditory atora aad tbaa a aaaful coataxt la which to aak about 
daoay* 

WBfEOgm I 

EzpaftiBtot I eowpr load mo parta* ta tba flrat* thart war a 10 wala 
coadlttoof daflaad by 10 atlaa^wa oaaat aayacfaroalaa aaparatlag tba two Itaaa 
oa aacfc %tt*\* fbaaa wara aat at 0 # 200, 400 4 600, 800, 1000, 1200, 1400* 
L600 t aad 1800 aaae* St oca tba rowala wara 300 aaac loag, tba flrat two of 
that a cppditioaa lacludad phyalcal owarlap hatwaaa tba two Itaaa* It 
daralopad that at laaat om of tba ovarlap alt^atloaa m aharply # lofarlor to 
tba loa^ar atlaoloa oaaat aaytichroay coadltloaa aad aebjacta coaplaioad that 
tbay wara coafwalag. Tbaa, aftar taatlaf 20 aubjacta la tba orlglaal daalfo, 
wa allalaatad tbt two abort aat atlaalaa * oaaat aayadhroay condition aad 
cootlooad for aaothar 20 t yb><t§, 

199 

ZQl 



•J 

iUgwU. *Th« atlanlue itaaa war* tbre«-f oraaat , «t*edy-*t*te, *yotb*tic 
vovtirTGSItr to tboee tMi by leap «t el. (1*7*). Thitm •tiuuli *peaaad tb* 
coat learn fro* the voael /I/ to til* The first format ceater froqeeactec 
reefed froa 2*9 to »7 ls t tbo eecoad froa 2296 to 2030 B*. aid too tbird free 
SOU to 2432 Kb, all la roughly logaritfajle *teee, for tb* cootiauua of eieht. 
Xa thle study, the fourth aad fifth tohaaa vara laft oat ao aa to eatyeace tat 
eoatraat aataaaa vitbia- aad aataaaa t atagaty dedelooe. Tha ereeeat aat of 
vewola corveepoed to Stlaell 1, 2, 3, 6, 7, aad • froa Tabic 1 of tepp at al. 
<it7t, page 139). Tha format baadeldtbe vara 63, 94, aad U0 8*. 
r*a?acti*aiy* tha vovele vara 300 aaac loog aad vara produced oa tha aaakiaa* 
Uboretorlee Oflltlc ayatbeeieer. Or* rail aaplitade root aUrply ovir th« 
first 30 aaac, Rhea reaaioad aaifora until a eyeeasric fall ever tha laat $0 
aaac. faadaaeatai fraqueacy daellaad gradually fttm 123 to 80 It throughout 
tha utteraaceo. 

» 

4 dtffaraat taat tape aaa prepared for each of tha 10 etlaulue oeeet 
eeyachroey coodltloe*. 0a aach, thara vara 18 pel re of Identical tohaaa (1-1 , 
.2-2, aad ao oa* aach repeated thraa xlaae) vhere tha corract inwtr aaa $4JB. 
Tha ether 42 aaira oa aach taaa coataiaad 16 "oaa ataf * DIFfUUUrT trial* (1*2 , 
2-1, 2-3 aad ao oa), 8 "two etep~ peirc, aad 18 aora vldely i pat ad QlinMpIT 
psUa coat reeling tha /l/ itaaa (1, 2, 3) with tha tU itaaa (6, ,7, aad 8). 
Thaaa 60' trial type* aara arm of ad oa tha tap* la a different raedoa ordar lor 
aach atlaalsa oaaat eeyachroey. 

Doclgo aad procedure. Tha eabjecie la fart Oaa (10 dlffcrast ctlaulu* . 
oaaat aayachroey caaditloaa including jo'ead 200 aaac) received thalr tape* to 
aa ordar detereieed by a halaacad Letlp eeoer* (couplet* control over f iret- 
ordar aaaaaatial af facta)* Tha eubjocts-la fart Too foUoaad tha aaaa Letle 
aaaara deeige hat tha tapaa with 0 aadf 200 aaac etievlee oa*«t aayachrooy aara 
aiaply deleted} thaa thay aad 129 faaar trial* thaa tha firat eqead of 
aahjaeta. Xaatroctloaa vara asp licit about tha expertenetel dot its aad 
atraaaad that tha crltarioa for a "*aaa* raapoaaa vaa to ha enact p hyolcol 
Identity . 

* 

Fol loviag aach trial-, thara vaa a f lve tacoad pause ' ha torn tha aaet 
trial. Thara vara ao aeralag aouada to eerfc trlala or rcepoeee parioda. Tba 
aahjaeta had a aaabarad a a av a r ahaat with tha lattars * aad d, vttleh thay ver* 
auppoaad to circla, Indicating taair raapoaaa for that "trial. A practice tap* 
coaalatiag of 6 aaaple trlala vat preeeated altar tha instruction*. 

tohjacta. Tha subject* vara 40 college-/** adult* ft c a v a h a. Bow Ba**;> 
aria, aoaa Tala *tad*at* eerrteg •» port of « tout** ra^utraaaat aad *ea* 
voluatecriag to aarwa for pay* 

Moult* aad Olacuoo i oa 

The aaaa overall proportion* correct for to* tvo part* of E*p*ria*ot 1 
(SAME aad DlfftWT trlala coabload) ar* thauu in th* firat tvo wi of Table 
1 aa a faaetioa of •tleulae oaaat ceyocbr ay, for thla *n*ly*i*. tba tvo* 
fclvda of trial* vara aot valgfatad (aaa tha d* «o*ly*i*. balov). Two thief* 
era quit* clear froa Inspections There It aoaa lo*« la dlecriainatioo •• a 
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fiatctioa'of dalay, aa va tswld axpact fro* tha Ft soot <i9?3) reault 
Sacoodly, lit* fu&ctioo Is fair from *s?«ptoH€ a«r the rasga § tailed h#ra* 

Asalydia of vauasca op iht## data co&fiwad the reliability of th* baatc 
daisy a! tact, for this a«*ly§ia tb# Tpana vara soafcioad asd oai? tha data 
fro* atisuiua oosat aayotferocUaa 400-5*00 vara iosliadad* To ba*a loclodod tba 
Q - mm 4a lay woo Id bars prodocad a fcislaadiogl? fcigfe F fcatii t* thia 
cocditlob *as *o axtraordioarli? poor raiatita to tba~* otbats* for tha 
cc«fcload dan, it* da lay affact as* highly • least H*tl*ticaUy, r 
{? f 2?3) * £ <,0Qi* 

It cac b# objaotad that tbaaa data ba iaf loaocad to aosa uofcisove 
Jtg?a* bjr chg&gaa is raapoasa eritsria for jedgigg tvo its*? physically 
idaot leal* acroas tba differ**** da lay cooditioos* Such argusaot* (aaa 
4tat*tllaa, Eapiajk* 4 Craalaa*, \97?) *aks tba caoa for aaalytittg tha data is 
(ana of Statistical Daciaitm Thaory r*b 1st Jia?a racpotly bsc o — art 1 labia 
for xrasaforttiog *arjabls~ata&dar4 saaa/dlffcract data i«o 4* (Kaplan* 
na caiman* i CraaWa, 197$)** Tba task la eoocaiaad aa 00a abara tha subject „ 
ia sat" to 49 da tact aaaaoasa * aod on* asanas falaa a lama tba proportion of 
SAKE r as poo* a* vbao tha tvo Uaaa vara in fact dlf farast. a TNj d^£| ralavacr 
to thia aoairait ara ah^m to tha aacood r*3 row of Tabla\l . Tfc* fmcleaioo* > 
of tba cw wot 1 00a I aoalysaa ara oosplataly auataiofcd £y tfetw^sbipaad 
^oilyalt of satrsitirl ty Analysis of aariaao* baaad o^'tha | 10 topay*ab)<fta ' 
f ror tipth part* of tba aspartate* 00 tba coodi tiooa in comma tstiialu* o&atf 1 
aagrattroay 400 through 1900 ssae) cocf ifnad tba reliability rfc tba dalay 
af fact, £ (>»63) * 3*92, WSa * *162» £ < *05* Tfeua, 00 c&tagiog cmarioa for 
"aaaa&aas* * acroas dif fara&t, %tta^laa eaayt aaysdbroay valo^, cak bf hald 
racpo^aibla for tba UaeXleiaf parforaanca oba^rrarf bara, Xota char althouga 
A <i>a blaa ia Hoe chariot otar latarvala la a way that prodacaa t&iKdacay 
af fact t tbara ia as overall* atrottg biaa la raapoodiag: Tfcis ia IsdicatM % 
tba Largo d' valuaa aod tba r a la tin* I y low (about 70S) rataa of corract 
raapottiinf* Tba 0*0 rail probability of aayiog SAMS i^mio tba tm atianil #ara 
idaatloal w* vary bi$^» *9l9 f aod tba corraapoodlof rata of falaa afaraa^ 
SANE girao dsf^araat, naa ,3?8* Thia ; aaM biaa aaa obaarrad is tba aacood 
aapariwot^ To rapaat, tba Important cooaidaratioo ia that a chanting biaa 
caata&t account for tba raault of latat«*t hara* 

AUboiigfe ^ha. aa)ority of triata in tbia aaparfaant coctaioad, by fa*i&t%^ 
It mm ttm tba sana pbooatic cattgory, char a vara aooogh ba t***ur cat ago ry 
p^irg to 1 fir pact for a dif faraoca batvaM tba aiaa of the dacay af fact in 
b*t*a«- aod vitbi^eatagory trlmla. Tbia waa dooa uaiog atiaula^ put r# aa tha 
a«apliog variabla* for aaet* ol tba 12 vitbiti^catagory pairs vfeara tba corratt 
rrfapooat ma ^dlffaract ,* tba ousbar of arror« atada by all aabjaeta 00 
stianlya onaat aayscbroaiaa 400*1000 and* i200~18Cfe vaa tmlU*4 aaparataiy» 
Tba rail ability of tha da cay affaet for tba vtifets-catagory data vaa Ytrifiad 
by a paired t*ttat g t (11) * 2*83, £ < *0i* Tha aaaa vaa dooa for ttm i*^? 
batwpao^catagory {jklfa aod agaltt tba dacay af?act vaa raliabia, t {17} « 6.43 t f 
£ < .©0i* Coiog by tba aiaa of tba t_ «alp#a, 00a atlgbt aoppoaa th# af fact «aa^^ 
largar for tba ba natao'-catagory paTra aod lodaad tba ran di ffaraotaa batvaao 
tba aMrt* aod loog^dalay cooditiooa vara aigsif leant ly largar for ,tba 
bat^aa&*oitigory pairs tha^ for tba withio^eatagory pair*. 1 {18} * 3*G0 9 £ < 
.005. Bo*a?ar, tba bat^ao-catagory paira all apasoad ** largar phyaieai 
dJ ataxic* tha© x|ia wi thio*cat^|ory jfdiira aod ao tbara vara saoy favar arrora io 
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tba fomr. If on« vithaa to life* tba r%tU of th* difftrtoca halve cn th# 
ah&r: ($) and loot (t) tatarvala r«Ucift to tbt total errors ntda for * pair 
r- (L * S)/U ♦ S) — tba dlf ftranct It rtraraad. ly thit Uttac »*a*ura % 
tbara vaa a sijpslf least ly larger da lay affact id tha batvaafrcatagorr «u 
ifi tha vithla-catafory data, t (2S) * 5,*0, £ < ,005. Tba conclusion ha* 
to bo that d*lay doaa sot haw a~largar affact 00 withln-catagory palm than 
00 b«t*aair*cat*gory paira. Thla vat tha <**t«oa« of the Piaom (1973) 134 gap? 
at at. (1979) teudiaa, too. 

Tba conpooarft of ptrforaanca to discrimination that can ba astlgnad to 
Story asaory * baa quits' plainly not rnathsd **y«ptot* by th* iongaat 
toesrval t sat ad in Expariwnt I* Tba win purpoaa of ixpariMot 2 va* to 
aapaod tba raaf* of totanrala tsstad* 

4 The atlaulaa owat asynchrony valua* uaad <o this sscoad stsdy aar* SOO, 
\<m. 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000 taac to ao*t otbet 
raapacta tba sxparUaot was sUtilar to Espariaoat 1 amcapt for 00* additional 
fiatura: Sxparlaaot 2 alao included a eoftplata run through tba aatarials for 
a*h aahjact U vWch identification* rstUar tba© .di serial nation, *** 
m***hr*a. tap? et al, (1979) had found that tba ita«* within a pair smarted 
re**bly ayiwatricai, contrast!^ af facts on lahaltng* That i», tbay obaarvad 
tbat abac ****** 1% ;t 4 pair aara baiag Ubalad* ghw^mailf aa /I, I , or t /, 
* tba idaotifj at otter pair pa* bar ia£Utano*£ tba if a* baiog labalad- Tba 
affact aa* cbatraatia*% afeiefi aaaoa that if ae a*bigroH*a vowat batvaaa i <w*S 
a/ *ara ?*s#at*t«d, hearing 1* tajma cants** of a^ onafebi^uoua 1/ a*4a it 
*»M*t aor* Uka /tA TJsat *tha affact **a syswtcrical tti^i tbav tba tirat . 
V£fc» in tba pair fnl Iwracad tb* * stood a fc Mf aa *rch ma tba othor vg? *ro*sd 
h&t mt ai* *u#gaat*d t^at ttaaa c^oraat affact a 00 pao»*tlc 1*1*1113% war* 
prad^ead by •KbasiiMa *U>4o wdlcor^ 'anon> bMaota ta coodttio^sa vnnr^e 
auditory imr/ m ravoaad d^la^ or by aaakii^, UttU co^t^ttaal 
tsfl^taca ^aa ioui&s* ^ 

ty aaal^u, tba sootraatt** tHac:* o iHrtsatic Ubtliaj can *c cc^ra^ 
vltb affual -bri^tnaa* coatraat: a gi^o tbada o^tay appaart brifbt^r it it 
oecvra lor tba cootaat of a dart background thaa ft it appaara in * light 
backgrtmod* for aoy tuccaaaiva tosotraat to aork, it mlgbt ba «ug«a«tad th*t 
th^ tao lta^a would ba*a to raat4fe tofatbar ia anory If to, rbao 
uoderatai^ ahy tapp at al. found iaat cootrwt vbac tbay cosproadsad tba 
auditory » to raga of ttaaa 4urioiM%a iotarotia»iaa latarval. it follow tb/tt 
cotkraat affact* com id ba a»4 *% aa tudapaadaot ae«ur# of tba duration of 
tudltory «a«ory v 

A wfd aboald ba addad aben^t abat cajgg contrast!** c^5t«t #f ftct* Hi* 
fa»«r«li*atloa of iaportaac* U tbat 00a wwal aftactt tba labal applied to 
aaatbar. pr yridad tbay ara difft^tot aod providad tb«y occupy audi?->ry memory 
togatbac-. la racaat pabiic«t tona (Cro^dar, I9?8, io praat) I ba^ra baguc to 
tdaaaca a thaory tbat co»art tbaaa fiadiofa, Tba caotr«l aasuaptltm ratairaat 
to codtaaft affact* {Qrg*w»r, to praa») t« tbat «aditotry-w«ory rapraaaotatlona 
lotaract by j£jg^a^cy*agc^tc inhibition of aacb ot^»r* Tbat it, if auditory 
•omory r«rpraa«atatiooa of tvo ita«s occur ci«a to^t^r is tiat, tad an th* 
tana cbftnaal, tbay will t*od to inhibit aacb otbar tnd tbis lohibitla« will ©a 



graateat is tptctral refloat where they contain overiappia* , eoergy , If mo 
will are *lftiUr % except for the placeaeot of ^ - or two forma <tte, thia 
f roq u* acy~t pec 1/ ic iabihition will pr^tfuck coat rut: The fosmaatt attociated 
with the wt,U ueed here have very C9**£dertbie ottrUp relative to their 
center frequency ditfereacee * ThU aaeat that two eoweit* focmeott will ha*e 
a* ana o? later tect too^totV *Jteo each will hawe aa iru oot* ia comoi with the * / 
otter. It iobltnrloo betweeo tfmm ia fraqueacy specific, the iatereectloo lo 
wtU' fonaott wil^ tuffer the 10111 leavlaa the ooa~latQre#ettai 
format tret ia each vowel relative!? intact ♦ Sine* the aoa-iatereictdog 



. ra^ioae wara what made the two wowele dittiactWe ia the firtj/pltce, 
tilaiaatiat taa regioo la como* will enhance chair dietiactleaaeae matutliy, 
and will la ad to cootraatl** ideotif i cat loo, Saa Crowdtr (ia praaa) for 
further expiteatioa. Thia interpretatioa it co&elattat with a theory that 
ippliee equally wail to the eufflx aad vowei-dlecrlfti nation taaka tod covert 
etteatially all tc&owa evidence ou tha tuff is afftct (Crwwder, 1978), 



Sylmill . h dlitf treat tat of vovale waa uted ia Expertraat l« This wat 
primarily in orda? to iacreaae tha geaerality of tha rtaaarch prog;eft« The 
13- item coatiaram uaad in thla study c row tad tha vowel apace la tuch a path a* 
to I aetata approximate prototypet of /tf/ f /A/« tod /Jf/ f which correepoad to 
tha vowel eouade ia COT, CUT, aad CAT, respectively* To achieve thia, tha 
lomat frequeacle* thowc ia Ttblc 2 wara tat oa tha 0V8UU tyatheeixer. 
included ia Table 2 are tha ^verall idaatlf Icatioa J**t whaa aach of tha 
thirtcea token* wat praeeltfl^wlth itaalf — that la, oa SAME trialt — 
coilapeed over tater*item delay*. Thaaa data jbow that* tha aubjecta wara 
^uite williaa to a&capt thia at a three-vowarl cont iauu*. ia other reepectt, 
tha etitulut itemt wara similar to those ot Experiment I * 

Each tmt tapa coattlaed 34 paire, of which 13 «re SaHE trltls (I- 1. 
2-2, 13*13), 11 wara two~ttap BlfFElSiT trialt (1-?, 2-*.,, . ♦7*1-13), aad 10 
wara threerttep Dt?FEWPrT trialt (1-4* 2-5,, 10-13 V w It wat arbitrarily 
decided to u*e ouly DlfflRBTT trialt that ttctadtd la t*r«t of the uuobtriaa 
of Table ? (that> It 1-4, hut aot 4-1). Thata pair typat were rtadewtly 
atdertd 10 tiaat aad placed 09 tapaa otherwise diffariag ocly in the ttlwulut 
Qtmmt aayachroay — 500, 1000, 1500, 2000. 2500, 3000, 3500, 4000, 4500, 5000 
aeac* The i stent*! between tr^iilt wet 4 tacoodt- 

ga^iga tad procedure. tt% Jac; vent through the 10 ttpet twice, 

firtt ia ta idaar if Icatioa ^^atimnt aad cecood fa a taaa/dif ftraat 
diacriatinttioa axpariatat* la ^ha former > they ware laetructed to littta 
careful -y to tha aacogd ttlaulua ia aach ptir aad to identify it by circling 
0tm n thr writ iC(h 4 CST, or C4T) oa a lumbered tauswer hlaak* It waa 
esp** t«u that tha 0 rtt ite« la etch pair would provide a contextual 
lai.-tace oa thit Ubeliaf, to the extent tha two iteae occupied auditory 
airaory toftther* Tha 10 tapaa were pretexted ia t balaaced Latia equare 
order. 

la tha «tcv^4 part of the txperiweat , tha ewe 10 tapes were preeeated to 
each tubject ia tha re^irte order to that a ted ia tha firtt part. H^re, ^-e 
ldtarructioM were to wake a tasa/differeat tadgsent for each >tir baaed t^a 
taaa criteria expleit»d ia the provlout experiawct. * A$aia* * practice ttpe 

204 # 




Matted 



\ 



ERIC 



206 



TABLE a 



focm*nt Structure 



StiauJu* 
number 



UbeU on SAMS Trials 

/a/ /a/ 



l 




m 


1091 


2431 


.969 


.015 


.015 






713 


1107 


2431 


.964 


.031 


.005 


3 


702 


1123 


2431 


,967 


.023 


.010 


4 




637 


1U9 


2431 


.918 


.072 


.Gil 


5 




668 




2431 


.667 


.323 


.010 


6 




633 


1?2 


2396 


.536 


.459 


.005 


7 


/a; 


639 


1»9 


2396 


.182 


.815 


.003 


8 




644 


r! 9 


2396 


.026 


.933 


.041 






644 




2396 


.023 


.795 


.182 


10 




649 


•1436 


, 2396 


.010 


.436 


,554 


IX 




633 


1343 


2413 


.005 


.221 


.774 


12 




658 


1633 


2413 


.003 


.038 


.969 


13 


/*/ 


638 


1719 


2413 


.003 


.005 


.990 
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wet used to provide familiarity wish the sounds* 



Subjects * The subjects virt 40 Toaag adult# from the tim tourcs as la 
Experiment V* 

lesultsSsd ^cuggiott: Plscrinlnstton 

The d&^crlmlaatioa ****** ^rw given is> F$|ure 1, which efcows the overall 
proportion of correct mm^ * judgnsot* is e function of stimulus onset 
ssyuchrocy. ±* itf the first eat, perfajpence began to drop sha*ply 

between one end tup second «evmr» the figure, shoirs little change after 
three seconds* suggesting that euaitoxy mtmery f~ to the extent it represents 
a decaying source of information for sams/f^lf ferent responding — has been 
lost by three seconds. J 

- r 

The earns picture is provided by the d* asjdyeie shenm in Figure 2* If 
anything the results are cleaner when corrected this way for possible 
criterioq^ertifecte* Statistical analysis confirmed ths reliahility of the 
f Indis^s io Figures 1 'sad Separate awftysss of the un trans famed error 
types "sans" on DIFFB1CTT trials and "different" on SAW trials showed that 
each component of the pooled errors in Figure i was statistically significant, 
rs (MSI) - **82 and 6.49, respectively, QlSa'e - *282a, xl«60.i) f jf s < 
70001 Analysis of variance on d s agftn used susarstftjects of four 
individuals each. There were 10 of these *<gj£*sebjecte abd Abe d' variance v 
associated with stimulus onset asyuchrott| was highly significant* V (9,31) * x 
?.55 t , KSe • 2163.74, £ * «00i* As In Rxpefimsat I, there wae no evidence that 
the delay was more potent for the within* than for the ietweenrcategory pairs: 
In this study, the identification results provided only five pairs that could 
convincingly be called *i thin-category (1-3* 1-4, 2-4, 7-9, and 1H3 ~ see 
Table 2)* Cm of these showed r^iuced errors free the short- to the long- 
interval conditions while the othe? four showed increased errors. The 
^ b* tween-cetegcry pairs showed reliable and consistent delay effects, however, 4 
t C15J - J*78, £ < .005. As before, the auditory component was not by any 
mane restricted to the cases where it era be tog discriminated natch in 
phonetic category. w r 

Performance remained quite good e**n after the component being attributed 
heM to auditory memory had decayed to asymptote* However, not too much 
importance should be attached t* the specific levels of correct responding, 
Tbeee reflect, among othet things, the mixture of easy, thrwetep 
discriminations (where performance ranged from .875 to .825} and the wore 
difficult two-step disc rial natlona (.670 to .580), fur the mo re, here wis a 
stroagVblas for responding "same," as is evident In the correct "sens" 
response on trials where the two item vers identical, where hits ranged from 
.WQ4n the 5G0~msec stimulus onset a synchrony condition to .875 in the 
*000-wec condition. Corresponding "sane" responses on DIFFERENT trials 
ranged from ,235 in the 500-mtec condition to .312 in the 3000-meec condition. 
The mean proportions in Figure I thus repreeent nous of the exact perfoipsnce 
levels obtained* The Important thing of course is the regularity of the data 
and not absolute levels Qt accuracy* ~— . 

Correct performsace was also in£ 1 iscxed by the particular Items being 
discriminated along the continuum from /*/, /A/» through /Jfc/. Table, 3 
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500 1000 1500 2000 2500 7/0003500 4000 45005000 

Stimulus Onset Asynchrony (msec) 

Presort loo of correct re»pon»«a, onnll, in BxperiMat 2. 



<-. 



209 



201 



4.2 




J I i 1 i I I I 1 L_ 

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 
Stimulus Onset Asynchrony (msec) 



Flfur* 2. &UM/dlff«r«ot discrimination ••noitlvlty (d*) a* « function of 
ttlauiut onoot Mynchrony in KsperlMnt 2* Uch point root Meats 
porfonuaco of ton ouporsubjoctt boood ob four iodlTidWU ipl*c«. 




TABLE 3 



Proportion correct ••■t/dif?«'">at discrlalnatioa (coaolaoa **k$) 

w m** - ~—m . . , ■ .., , , .■■■, „ - „ ., ■ . 



SAW 



DTttnanr 



Two-Stop 



Throo-Sttp 



Pnir 


Proportion 


Pair 


Proportion 


Pair 


Propoi 


I* 1 


.943" • 


l-~3 


.140 


1- 4 


.373 


.2- 2 


.947 


2- 4 


.187 


2- 5 


.637 


3- 3 


.940 


3- 5 


.485 


3- 6 


.735 


4- 4 


.937 


4- 8 


.515 


4- 7 


.747 


5-?5 


.670 


5- 7 


.490 


5- 8 


.883 


6- 6 


.880 


8- 8 


.767 


6- 9 


.957 


7- 7 


.855 


7- 9 


.665 


7-10 


.987 


ft- 8 


.920 


8-10 


.643 


8-11 


.973 


9* 9 


.917 


9-11 


.825 


9-12 


.975 


10-10 


.917 


10-12 


.867 


10-13 


.9/5 


if-U 


.897 


11-13 


.865 






12-12 


.910 








13-13 


.957 
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•hem* eh* proportion corrert overall for each of the SAKE end DIFFERENT pelre 
used in the eftFeriment* Quit* clearly, the /Jf/~end of the continuum was 
eaeler than the /a / **nd* Tbeee dif ttrcMti ref lect no doubt the epecing of 
tokene shown i* Table 2. Bow*?, the Important question it whether the main 
dicay results were general across th*se stimuli* w<iich differed widsly 
otherwise to discrimination difficulty* The soever in reassuring x Among the 
13 types of SAKS trlele t<l~l» 2-2, end so on) performance at the shortest 
lateral wee better than performence at the longest interval in ceo cesee, 
«rith one tie and two revareale, £ * .019 by a sign teat, Among the 21 
DIP PIHEHT trial type a, tMre were 17 paira show log the ease difference, with 
one tie end three re venule, £ » .006 by e sign teat. Thus the extreme 
variability in peir difficulty la another reeeon for skeptic lee about the 
ebeoluta values of the mesne shown in Figures 1 and 2 but it dees not discount 
the generality of the tine profile shown there* 

One eight very well wonder whether the group ssymptote of 3000 msece Is 
representative of the perf omenta of many Individual subjects. The anmiyeee of 
variance reported here Insures , that the decay effect generalised ecroee 
variability due to subjects end evidence hen been presented, ebove, for such 
generality ecroe* Items* But the generality of the esymptots requiree 
stronger arguments* There ere not enough data for each subject to calculate 
Individual regressions of performance on delsy * However, the d* veluee ' for 
the ten eupersubjecte could be inspected ecroee the tun delays for that 
purpose. As e rough eetlmste of where these tan functions reached asymptote, 
the interval with the lowest d* wee determined* For one aupereubject, thle 
minimum wa at $00 msec stimulus one at esynehrony, for another, it wee at 
S00d, end lor two each of the remaining eight, it fell et 3000, 3500, 4000, 
and 4500 maec. Thie uegr rectangular distribution of the minima la cone latent 
with the genereiitetloo that performance does not change efter 3000 eeec. 



He suits and Discussion* Identification 

The identification reeulte from SAME trials have elreedy been displayed 
in Table 2* These data are ccllapssd over etimulue onset eeynchrony but, aa 
will be seen presently, etimulue onset eeynchrony did not matter for the SAME 
trials- The tdentlf icetion deu of Table 2 ehow there were two boundaries — 
that between /<!/ end /A/ falung between stimuli 6 end 7 end the one between 
/ A / end /</ falling between stimuli 9 end 10. The question is now whether 
these boundaries shifted when eubjecte were identifying the exact same tokens 
but in the context of a prior Item from "higher up* on the numbered continuum 
of Table 2 (recell that the prior context elveys came from thie direction). 
To replicate the Repp et al. finding of contrast, the preeent reeulte would 
have to show that k givei/token sounded ee though it ceme from "lower down" on 
the continuum if it occurred on e DIFFERENT triel then if it came on a SAME 
trial* In terms of boundary locations, thie mesne the boundaries would shift 
to the opposite direction — to s smaller numerical value. 

The data relevant to -thie point are ehown In Figure 3, which gives a 
summary of context effects- Here, the ' * * of Table 2 on ldentlf Icetion ere 
broken down into the different etimulue oneet eeynchrony conditions " grouped 
by two's for stability. Boundarlee were estimated by linear regreeslon on 
stimuli 3-8 for the /ft /-/A/ transition and on etimull 8-12 for the LKhljtl 
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trace it loo. The two bouodariea eeeodeted with the three phonetic aegaente 
era collepeed in euch a way that tha ouaerlcel boundary a*aaur«a oa the 
wart leal axle ehow tha aeaa atlaulue nuabar of tha two odarit*. 

Figure 3 ahova clearly that for SAME trial*, tha attaulua ooaat 
aeyachrony aide no difference. Bowcver, oa DIFPCREffT tHele, boundary 
iocatlone ahif tad in tha expected dl faction — toward lower ouaarlcal valuta 
— whoa thara had bean a cacant context itea. If tha SOA wee longar than 
thraa aacoeda, it wee aa if thara had bean no context at all, but at ahortar 
intervele, cootaxt chanted tha label* applied to tha eecood atabar of tha 
pair. T&a convergenca of phonetic la be Hot on SAME and OtFIERBHT triala at 
three aaeonda ia conalatant with tha euggeetioii that contraat oparetee when 
tha two itaaa in quae tint, occupy auditory aaaory together. -The particular 
tia* interval at which theee data converge ia in epproxiaete agreeaeat with 
k tbe eetiaate of aayaptotic decay that wee baaed on dlscrialnetioa, reported 
above. 

Statistical anclyaee confiraad the reliability of the* picture pree toted 
in Figure 3. For , the abort ttlaulua octet aaynchroolae coablned (500 aaac 
through jni including 2500 aaac), 26 out of 37 nontied aubjecto placed etiaull 
6 and 9 farther down the nuabered coatlouua on DIFRRfir? triala than on SAMS 
triala, £ - .01. The context effect wee eurprlalngly general acroaa etiaull 
aa wall aa aeroea aubjecta: For the abort and long intervale, aa defined 
above, each of the 11 etiaull labeled in e OIFFEREaT context {nuabero 3, 
13) waa giv«n a aean "placeaent ecore" along the continuua. Thle 
placeaeat waa elaply a weighted average of the three phonetic lebele aeelgned 
by aubjecta. 6 Tha eeae placeaent ecore wee available froa the SAME triala., 

Tha que it loo waa whether a given etlaulua i£aa would receive lower (that 
ia, farther down the llet) placeaent on the DIFRRB9T triala than on the SAME 
triala. At the abort Intervale, thia waa the raeult for 9 of the 11 iteaa, £ 
• .033. Furtheraore, 10 of tha U iteaa eleo ehowed the full pettero of 
Figure 3 — e bigger directional difference between SAME end DIFFERENT trial* 
at tha abort than at the long intervale, p - .006. Thup, the cootreetive 
context effecta on labeling general lit both acroee aubjecta end acroaa 
individual vowel tokene. * 

It la aoaewhet eurnriaing that the context effecta proved an cone latent 
acroee etiaull. One would have expected pr'laarlly the eablguoua Itaaa to a how 
influence of context. Therefore, further enalyaee, were undertaken to exaaine 
the relation between the degree of tha context effect and the poeition of a 

\atiaulue on the continuua. For thia purple, only the ehort (500 to 2500 
aaac) etiaulue oneet eeynchrony data were ueed. For each of the 11 etiaull 
that were l«to'»d oa DIFFERENT triala, two olaeeaent ecorea were coapered, one 
on SAME triala and pne on OIPFERJMT ttlele. A poeltlve difference aeana the 

♦vowel in queetion ehowed different phonetic labeling, in the predicted 
direction, when it followed enother vowel froa the continuua. theee 
difference* in 'placeaent ere ehown in Figure 4 lb arbitrary nuabar a that 
reflect calculation of placement acoree. The figure aakee obvloua that, 

although all but two, iteaa ehowed a "poeitlve* context effect, aa reported 
ebove, the aite of that cootaxt affect waa related fn an lnterpreteble '-anion 
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Figura *. • Ttaa ralatioo bstvtao atisulua auubar and Um tit* of oonttit 
affaota oa laaaliag. A high potitivt aeor* mhu a particular 
fatal Mia labalad aa aoalng fron fartnar wmy froa ita prior 
ooatatt voual fchaa it Mould hav* baan if that prior oontaxt bad 
baaa tha aaaa ?ok*1 itaalf . Tha arrow abe* ooablriad oatagory , 
boundarlaa for SaMftriala. 
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to ths category boundaries derived fro© SAKE triili\Tbtr» were two peake in 
the contextual influence and they coincide cloaely with the two category 
boundaries* In othtr word*, a* ooi might aspect, it w the aefciguout items 
that warm most euaceptibl* to c&taxt* 

CEWUtAL OISQTSSIOtt 

Tha min goal of these studies wee to provide psreattric data on the 
decay of auditory sensory aeaory. Tbt results give a coostnteot eetiaate that 
ttda da cay la asymptotic *t close to thraa seconds, for the euccaeeiv* 
diacriainatlon taak ueed fear a. Tha phono tic late lint data of Figure 3 ebow 
another aeni testation of auditory aeaory — context influence on 
identification — and thla Influenc* diaappaara at Just tha seme time* 

Experiments using related techniques to Investigate mmoxj for tonoa (for 
example, Harris, 1952; Koes, Myers, 4 Pliaore* 1970) do not necaeearily 
converge on tha eaae eetiaate; however* .tha ra art typically not enough 
intervale studied in tbaae experlasnts to aa tab 11 ah an asymptote, and, pvmn if 
that a vara, tha stimuli and taaka ara dlffaraot enough to diecouraga 
comparison- On tha othtr band, tha tetimste of tbraa seconds la cloaa to tha 
value suggeeted by ^Crow$ar and Morton (1969), avao though that eetimett warn 
only a ahot in tha dark. 

Although thvhigb performance lava la in thata experiments demonstrate 
that othar factors baaidaa tranaiant auditory aeaory aopport performance in 
thla tank setting, it ia a relatively uncomplicated taak compared to tha 
sufflfc experiment • If fur t bar raaaarch auggaata that tha euccaaelv* vowel 
discrimination taak need hart tapt tha aaaa aodltory memory etore that haa 
baao to extenafvaly studied in tha suffix axperlment, it any ha ad vie* hie to 
focue*on tha fornar rathar than tha lattar lo futurt work because^ it ia ao 
auch aort diraet a at t hod. Parhapa tha laaat encouraging evidence on thla 
point ia tha finding of ttetklna and Todraa (1980) and of Hetklne and Watkina 
that suffix-lika af facta occur following fillad da isy.s of tip to 20 
seconds* tt will ba for further raaaarch to clarify what ara tha boundary 
condltlona on thla delayed attffi* affact and to estabii»a whether it haa tha 
taaa functional propartiaa aa tha laaadlata suffix effect, auch aa eeasitivity 
to phonatic claaa and to physical tourca channel* 

Tha aoat intuitively plaualbla aadel for how auditory ataory ia used in 
•patch diacriainatlon la that eubjecte try flrtt to atka a same/different 
daclalon baaad on phonatic labala and, only aftar that haa failed, tffVon # to 
coaault auditory aaaory* Tha luia la "If tha two sounds hava different ^bamee, 
say 'dlffaraot/ otharviat coapart tht toends * themselves*" Thla aodal (aaa 
Crowder, in prese, and* Pi tool, 1973, for details) la apparantly wrong* It 
anticipates that af facta owing to auditory aaaory would ba stroogar In tha 
wi thin-category di serial nations than in tha between-category discriminations, 
neither tha prat ant studies, tha raaulta of Pieoni (1973), nor those of Hepp 
at al* (1979) gave evidence for tha pradictad interaction. 

Parhapa subjects adopt aoaa private categorical discriminated that does 
not aatch tha conventional phonatic categories but nonetheless servee a 
siailar role In performance on the withio-category pairs. After listening to 
tha itaas in tha stilus eaaaable for aoaa time, subjects might vary well 
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r oa pfr a tba two ftouad** In chat cim, with "functional catagorlaa* 
eu»atructad for tba eoalaal wltbia-catsgory P»iw. It wouldt sot bo to 
Savrprlaiog that tbara m about tba mm auditory inlloanca In tba wUfcio- and 
featvaoa-catagory Jodgaaata.. 



Ooaa tba tbraa-aaeoad, aatlaate froa tola raaaarcb aoggast any functional 
rola for auditory aaaoty aatalda tba oarrow taafc coof la** of thla procsdorat 
Of cosraa, tbara, ara no* only tba noat prallalnary of fort* to coanace lava of 
iaforaatloo acocaaalag to raal-tlaa Uaguano proeaaalag. Bovaanr, Stavaaa 
(1978, paga 14) baa natal tba ralaa'oaablp batwaao aaetaaca-luagtb uttarancna, 
aai bratthlag. la obsarvna * cloaa ralatloa bctvoao syntactic it rue to raw and 
tba paaaaa iatrodacad by a apaakar for tba inspiration of braatb. Aa Stawaaa 
not as, aalaanelaa of broatblag liait aaat aac aa , or otbor aajor ayo tactic 
st roc tar as, to a laagth of oat aora than two or thraa aaeoada. Tbua, tba 
t hr aa aa c oa d f Igars la of aoaa llnguiatlc lataraat io a any that coVld. ba 
ralatad to apaacb production or c o a arah a aa foa. Bat tbla eoaaaot la ao aora 
tbta aogaastlvas for oaa thAg, tba acbolc dacay aatlaata coaaa froa a 
sltaatioa abara tba traca la bald la coop lota allasca vbaraaa tba cvo- to 
tbraa-aacoad Halt aaaoelatad with tba braatb group la typical I7 fillad with 
apaacb. 
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POOTHOTES 

l 0ur atstaatat (Crovdar 4 Mortoa. 1969* ^aga 366) vaa that a atora lastla& at 
laaat on tHa ordar of a *fa* sacooda* would bt adsquata far tba f*s(gt£c£rt 
tola va bad propoaad for auditory, Jfcaory* * * vy/* * ^ K * ^ 

^Sorkla (1962) baa shows why a atraightforvsrd application ^ of ataadatd 

tab Is a to tba aaae/dtf faraat aituatloa ia Inappropriate. ~' * 

*For purpoaas of gattlug d* - value** aub)«ts vara coablaad * lato 
"suparaubjacts" . of n*4 bacauaa aaay individual alt rata* vara etoaa to or at - - 
1.00* Tba aaao data look aaaaatUliy tfa imw what bar oW rill hit aad filae 
alara rataa ara takaa bafoip calculation o'f d* or tbasa rata? ara calctflatad 
for aacb auparaub^act* For tba purpoaa bt statistical tast^ oa valuas.^" 
bowarar, it la coavaaiaat to aat up, tba npfjaubj^cts first. ^ \ 

4 4 aback on thdf dat£ o£ Tabla 2 #111 varify^that patforaaoce oa la^lto* # 
wl thin jthaaa Vaagaa fba uaambi^aouily llaaar for tba gr^ip data* 

Hbeaa tWatliaili t 6 aad 9, vara^cbosap bacauaa tbay rapraaetvc parforaaoca 
froa Itaar |luat prior to tba \v6i r as pat tiv* boundaries oa tha ldaotif Uatlun 
taat and JtbatCfora Should W^cifent aablguoua at^aull* aspaclaily subjact to 
coatq^^j/fatTta^^ 7 ' 

5 Spacl finally, tfe th^aa idaatiflcatioe saa^f^aa /* , A t */ vara* aaslgaad 
tha oSbara \ % 2, and 3».raapactivaly* Tha total .rcspoase to a atiaulua for a . 
vpm aub>ct conld'tbao ba charactaritad aa aa avdtaga of tha nuabara 
toad to tit« Thaaa avaragaa vara thaa coaprassad lo s ran^a frcwi ,33<to 
f^Fr aaalyal^j 
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Ttt ottMSKCE or peownc stwhttuu* 

Ml duel Studdert-Cennedy* 



AJjsjsrsct, ft explain the unique efficiency of ;pt«ch as an *coastlc 
carrier of linguist i j information and to reeolve the paradox that 
ualta o orr ea po nd lag to pbottetio segmenta are not to be found in the 
signal, oonsonante and vowcla vara said to ba "encoded" into 
sy 11 solo tutu. This approach stimulated a daoada of research into 
tha nature of the speech code ant of its preaumabiy specialised 
perceptual decoding mechanises, but began to lose force ss its 
implicit circularity beceae apparent. An elternatlwe resolution of 
the paradox proposes that the signal carries no acssaf*: it carries 
information decerning its aouroe. The meaaage, that la, the 
phonetic structure, cmergea from the peculiar relation between the 
aouroe and tha listener, aa a human and ss s speaker of a particular 
language, This approach, like its predecessor and like such recent 
work in child phonology and phonetic theory, takes the study of 
speech to be a promising entry into the biology of language. 

The earliest olaim for the special status of speech as an acoustic signal 
•prang from the difficulty of devising an effective alternative code to use in 
reading meohlaca for the blind. Many years of sporadic, occasionally concen- 
trated effort have still yielded no acoustic system oy which blind (or 
alghted) users can follow a test such extra quickly than the 35 words s minute 
of skilled fet>se code operators. Given tha vary high rates at which wa handle 
an optical transform of language,' In reading and writing, this failure with 
acoustic nodes la particularly striking. Evidently, the advantage of speech 
Ilea not in the modality itself, but In the particular way It exploit's the 
modality. What acoustic prop* Mas set speech in this privileged relation to 
language? 

The oohoept of •eaoodedness* was an early attempt to answer this question 
(Llbarman, Cooper, Shankweller, * Studdert-Kennedy^ 196?) . Libera an and his 
colleagues embraced tha paradox that, although speech carries s linguistic 
message, units oorresponding-to those of the message ^ara not to be found in 
\be signal. 4hay proposed th»t speech should ba viewed not aa a cipher on 
linguistic structure, offering the listener a signal leoaorphio, unit for 
unit, with the message t but »s s code. The oode eoliapeed the phonomiflr 
aagmants (consonants and vowels) into acoustic ayliebies, so thst cues to the 
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component segments were subtly inter leaved. The faction of the code was to 
finesse the limited temporal resolving power of the ear. We typically speak 
end comfortably understand speech at a rate of 1CM5 phonemes/ second f close to 
the rate at which discrete elements serge into a butt. By packaging 
consonants and vowels into syllabic units, the argument went, we reduce this 
rjfa by a factor of two or thrse and so bring the signal within the resolving 
range of the ear. 

This complex code called for special! red decoding mechanisms. More than 
a decade of research was devoted to establishing the existence of a special* 
Ued phonetic decoding device in the left cerebral hemisphere and to isolating 
the perceptual stages by which the supposed device analyzed the sellable in*o 
its nonet ic components. This information- processing approach to speech 
perception exploited a variety of experimental paradigms that had seemed 
valuable in visual research (see Darwin, 1976, and St udder t-Kennedy, 1976, 
1980, for reviews), but led eventually to > dead end, as It gradually became 
apparent that the undertaking was mired in tautology, A prime example was the 
proposal to "explain" sensitivity to features, whether phonetic or acoustic, 
as due to feature-detecting devices, and to look for evidence of such 
mechanisms in infants. 

Current research has drawn back and is now moving x along two different, 
though not necessarily divergent paths. The first bypasses the problems of 
segmental phonetic perception and focuses on what some believe to be the more 
realistic problem of describing the contributions of prosody, syntax, and 
pragmatics to understanding speech. The second path, with which I mm 
concerned, reverses the procedure of the earlier encoding approach. Instead 
of a^siAlng that linguistic units should somehow be represented as segments In 
the signal and then attempting to circuavent the paradox of their absence by 
tailoring a perceptual mechanism for their extraction, the new approach simply 
asks: What information does tbe speech signal, in fact, convey? If we^couid 
answer this question, we might be in a position not to assune and Impose 
linguistic structure, but to describe how it emerges. 



Consider the lexicon of an average middle-class American child of six 
years. The child has a lexicon of about 13,000 words (Killer, 1977), aost of 
them learned over the previous four years at a rate of 7 or 8 a day. Whet 
makes this feat possible? Of course, the child cjust want to talk, and the 
meanings of the words she learns must match* h$r experience: cat and funny , 
say, are more likely to be remembered than trepan and surd . But logically 
prior to the meaning of a word is its physical manifestation as a unit of 
WSOromusciJlar action in thr-speaker and as an- auditor /went in the listener. 
Since the listening child readily becomes a speaker, even of words that she 
does not understand, the sound of s word must, at the very least, carry 
information on how to speak it* Hore exactly, the sound reflects a pattern of 
changes in laryngeal posture and in the supralaryngeal cavities of the vocal 
tract. The minimal endowment of the child is therefore a capacity to 
reprodu^ ? a functionally equivalent motor pattern with her own apparatus. 
What properties of the speech signal guide the child's reproduction? 

We do gfct know the answer to this question. We do not even know the 
appropriate dimensions of description. But several lines of evidence suggest 
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that the properties may be Mft dynamic and more abstract than customary * 
d^eeriptlons of spectral sections aod spectral change. For example, seme half 
dozen attxHea have demonstrated "trading relations* among acoustically^M-cm- 
memsurate portions of tha signal (e.g., U barman 4 Pisonl, 1977 r Repp, 
U barman, Iccardt, t Peaetaky, 1978; Fitch, Halves, Ericsson, * LiDerman, 
J9W). Perhaps tha mo*t familiar ax an pi a la tha relation between cms at 
fr equenc y of fir at formant transition aod daisy in voicing at tha onset of a 
atop oofiaonant- vowel ay 1 labia: reciprocal variations in spectral structure 
cod duration of dela> produce equivalent phonetic percepts (Suamer field & 
Haggard, 1977). Prcswebly, the grounds of this aud other such equivalences 
lie In the articulator dynamics of natural speech, of which ve do not yet 
have m adequate account. 'For a review of studies of this type, see Repp t 
198!). 

i second 11m of evidence comes from studies of sine-wave speech 
synthesis, lames, lubin, Plaoni. and Carrell (1981) have shown that much, if 
not alt, of the information for tha perception of a novel utterance is 
preserved if the acoustic pattern, stripped of variations in overall amplitude 
and in the relative energy of formants, is reduced to a pattern of modulated 
sine waves following the approximate center frequencies of the three lowest, 
fcrmanta. Here, it seems, nothing of the original signal is preserved other 
than changes, mini derivatives of changes, in the frequency positions of the 
■tin peaks of the vocal tract transfer function (cf. Kuhn, 1975). 

Finally, several recent audio-visual studies have shown that phonetic 
judgments of a spoken syllable can be modified if the listener simultaneously 
watchea a video presentation of a face mouthing a different syllable: for 
•xamfle, a face uttering [ga] on video, while a loudspeaker presents [ba], is 
usually Judged to be saying [da] (HoGurk & HacDonald, 1976^ Swmerfield, 
* 1979). The phonetic percept, in such a case, evidently derives from some 
combination of abstract , dynamic properties that characterize both auditory 
and visual patterns. 

Moreover f infants are sensitive to dynamic correspondences between speech 
heard, and speech seen. Three-month-old infants look longer at the face of a 
woman reading nursery rhymes if auditory and visual displays are synchronized, 
than if the auditory pattern is delayed by 400 milliseconds (Dodd, 1979). 
This finding evidently reflects more than a general preference for audiovisual 
synchrony, since elz~aonth-old infants also look longer at the video display 
of a face repeating a disyllabic that they hear (e.g., [lulu]) than at the 
synchronized display of a face repeating a different disyllabic (e.g., [mama]) 
(MeoKain, Studdert-fennedy, Spleker/4 Stern, Note 1). 

The point here la not tha cross-modal transfer of a pattern, whi ih can be 
♦ demonstrated readily in lower animals. Rather, it is the inference from this 
cr oaa ttA dal transfer, and from the other evidence cited, that the speech 
signal ^pnveya information about articulation by means of an abstract (and 
therefore modal ity-lrec) dynamic pattern. The infant studies hint further 
that the infant learns to apeak by discovering its capacity to transpose that 
pattern into an organizing scheme for control of its own vocal apparatus. 

Here we should note that, while the capacity to imitate general motor 
behavior may be quite common across animal species, a capacity for vocal 
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imitation is rare. *e should also di^irguish social facilitation and general 
observational learning from the detailed processes of imitation, evidenced by 
the cultural phenomenon of dialects among whales, seals, certain songbirds, 
and humans. Finally, we should rot* that speech (like musical performance 
and, perhaps, dance) has the ^peculiarity of being organized, at one level of 
execution, in terms of a relatively snail nuaber of recurrent and, within 
limits, interchangeable gestures* Salient among these gestures are those that 
correspond to the processes of closing and opening the vocal tract, that is, 
to the onsets (or offsets) and to the nuclei of syllables* 

We do not have to suppose that the child must analyze adult speech into 
features, segments, syllables, or even words, before she can set about 
imitating what she has heard* To suppose this would be to posit for speech a 
mode of development that precisely reverses the normal (phylogenetic and 
ontogenetic) process of differe ation. And, in fact, the earliest 
utterances used for symbolic or communicative ends seem to be prosodic 
patterns, which ^fltlin their unity across a wide variety of segmental 
realizations (Menn, 1976). Moreover, the early words also seem to be 
indivisible: for example, the chi^d commonly pronounces certain sounds 
correctly in some wrrds, -but not in others (Menyuk 4 Menn; 1979). This 
implies that the child's first past at the adult model of a word is an 
unsegaented sweep, a rough, analog copy of the un segmented syllable. And 
there is no reason to believe that the child' s percept is very much more 
differentiated than h$r production. Differentiation begins perhaps, when, 
with the growth of vocabulary, recur resft patterns emerge in the child 1 s motc^ 
repertoire. Words intersect, and similar control patterns coalesce into more 
or less invariant segments. The segmental organization is then revealed to 
the listener by the child's distortions. Menn (1978, 1980) describes these 
distortions as the result of systematic constraints on the child's output: 
the execution of one segment of a word is distorted as a function of the 
properties of another. She classifies these constraints in terms of -consonant 
harmony (e.g., [g*k] for duck ) , consonant sequence (e.g., tnos] for snow ) , 
relative position (e*g., [dmge] for ' gator ) , and absolute position (e.^., [*$] 
for fish ) . 

Here we touch on deep issues concerning the origin and nature of 
phonological rules. But the* descriptive insights of Menn and others working 
in child phonology are important to the present argunent because they seem { to 
justify a view of the phonetic segment ae' emerging from recurrent motor 
patterns in the execution of syllables rather than as imposed by a specialized 
par«#ptual device* As motor differentiation ■ proceeds, these recurrent pat- 
terns form classes, defined by their shared motor components — shared, in part, 
because the vc^ax tract has relatively few independently movable parts. These 
components ire, of course, the motor origins of phonetic features 
(cf. Studdert-Kennedy & Lahe, 1980). Some such formulation is necessary to 
resolve the paradox of a quasi-continuous signal carrying a segmented linguis- 
tic message. The signal carries no message: it carries information concern- 
ing its source. The message lies in the peculiar relation between the source 
and the listener, as a human and as a speaker of a particular language. 

4 

Readers familiar with the work of Turvey and Shaw (e.g., 1979) will 
recognize that the present sketch of a new approach to speech perception owes 
much to their ecqlogical perspective (as also to Fowler, Rubin, Remez, & 
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Turvey, 1980). What may not be generallv realized 13 that this perspective is 
highly compatible with much recent work in natural phonology (e.g., Stampe, 
1979), child phonology (e.g., Henn, 1980) t and phonetic theory (e.g., Lind- 
blcrn, 1980; HacNeilage 4 La^foged, 1976; Ohala, in press). For exaspie, 
Lindblom and his colleagues have, for several years, been developing princi- v 
pies by which the feature structure of the sound systems of different 
languages might be derived from perceptual and articulatory constraints. More 
generally, Lindblom (1980) has stressed that explanatory theory "must refer 
"...to principles that are Independent of the domain of the observations 
themselves* (p. 18) and has urged v Uat phonetic theory "...move [its] search 
for basic explanatory principles into the physics and physiology- of the brain, 
nervous system and speech organs..." (p. 18). In short, if language is a 
window on the mind, speech is the thin end of an experimental wedge that will 
pry the window open. The next ten years may finally see the first steps 
toward a genuine biology of language. 
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AUDITORY INFORMATION FOR BREAKING AND BOUNCING EVENTS: A CAl'E STUDY iU 
ECOLOGICAL ACOUSTICS 

i 

William H. War ran, Jr.* and Robert R. Verbrugge* 



Abstrsot . The aeohonioal events of bounoing and bracking glass arc 
aoouatioalJy spaoifiad by aingla vs. aultlple daapad quasi -periodic 
puis* patterna, with an initial noisa burat in tha oaaa of breaking. 
Subjeota show high aoouraoy in distinguishing natural tokans of 
thas* two events and tokens constructed by adjusting tha periodic!- 
tiea of spectrally identical ooaponcnta. Diffarenoes in aWaga 
speotral frequency ara therefor* not necessary for peroaivtng this 
contrast, though differences in spectral consistency over suooesslva 
pulses apparently are laportant. Initial noisa correaponding ' to 
glass rupture la not accessary to distinguish breaking from bounc- 
ing, but nay be important for* identifying breaking in isolation. 
The data lndioata that higher-order teaporal invariants in the 
acoustic signal provide information for the auditory perception of 
these events. 

Resesroh in auditory perception haa eaphaaixed the detection snd process- 
ing of sound elements with quasi -stable spectral structure, such ss tones, 
foments, and bursts of noise. In the speotrel doaiain, these elements sre 
distinguished by- frequency peak or range, bandwidth, snd amplitude. In the 
teaporal domain, acoustic analysis has often been Umited to the durations of 
sound elements and the intervale between thee. Much of traditional peroeptual 
reaearoh, Including that of olassloal payehoaoouetioa, haa focused on lis- 
teners' response aoouraoy to essentially tiae-conatant functions of frequency, 
aaplltude, and duration, on the assumption that eoaplei auditory peroepts sre 
ooapositions over sound eleaents with those properties (Fletoher., 193* ; 
Helahoits, 1863/195*; Ploap J 1964; see Green, 1976)* 

s 

The peroeptual role of tiae-yarying properties of sound hss received 
noaparstlvely little attention^ Some exceptions to this osn be found in 
research on aaplltude snd frequency moduli t ion, particularly as they relate to 
olaasioal auditory phenomena such as< beats snd periodicity pitch. In genersl, 
however, resesroh on time-varying properties has been most common in the study 
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of eltsses 'of natural events, such as human speech, music, and animal 
eommunicst'on, where an analysis of sound Into quesi-atsble elements is oftan 
problematic* In the oesmaof speech, for exmepi*, many phonemic contrasts can 
ba defined by differences in the direction and* rate-of-chang e of major speech 
resonanoea (see Li harm ah, Cooper, Shankwellcr. 4 Studdert-Kcnnedy, 1967; 
Uberman, Delattrc, Gerstman, 4 Cooper, 1956), Some research on the percep- 
tion of music has also demonstrated the peroeptual slfnlficance of time- 
varying properties. Identification of musiosl instruments, for example, is 
strongly influenced by the temporal structure of transients that accompany 
tone onaeta (Luce 4 Clark, 1962; Saldanha 4 Cor so, 1964). In particular, the 
relative onset timing ami the rates of amplitude , change of upper" harmonics 
hsve been found to be crltlcsl properties of attack transients that permit 
distinctions among instrument families 0 (Grey, 1977; Grey 6 Gordon, 1978). 
Anlfcsl vocalizations are similarly rich in time-varying properties (such ss 
rhythmic pulsing, frequency modulation, and amplitude modulation), and many of 
these properties * have been shown to be crltiosl for distinguishing the 
species, sex, d anger ousness, location, and motivational state of the producer 
(e.g., Brown, Beeoher, Moody, 4* Stebblns, 1978; Xonlshl, 1978; Peterson, 
Beecher, Zoloth, Moody, 4 Stebbins, 1978). 

It is noteworthy that In each of these areas of reaearch on natural 
events, the discovery or explenstion of perceptuslly significant, time- 
varying, acoustic properties has been motivated by an analysis of the . time- 
varying behavior of the sound source . In the esse of speech, for example, an 
iitelysis of speech production hss been en integral part of the sesrch for the 
acoustic basis for speech perception (e.g., Fent, 1960; Fowler, 1977! Fowler, 
Rubin, Remez, 4 Turvey, i*80; Uberman et si., 1967; Verbrugge, Rakerd, Fitch, 
Tuller, 4 Fowler, in press). It is slso worth noting that resesrehers in 
these sress have often found it more useful to characterize perceptusl 
informstlon in terms of higher-order struotUre in sound— that is, in terms of 
funotlons over the traditional measures of • frequency, amplitude, and duration. 
Given the time-varying behavior of the sound souroes Involved, it is not 
surprising thst many of these functions sre time-dependent in nsture, defining 
ratea of change and styles of chsnge in lower-order aooustio vsrlables. 
Finally, it la not uncommon for reaearchers in these fields to view this 
tcoftbral structure ss s property of the sound stream itself, rather than as a 
property that must be introduced by s percelver while constructing s percept. 

The rolu of time-varying properties in the perception of other familiar 
events in the humsn environment is largely unknown, and research on the 
subject has been sparse. Our goal in this psper is to demonstrate by argument 
and example that Aigher-order , temporal structure csn be important for 
distinguishing such events. 

It is sppsrent from everyday* experience thst listeners csn detect 
significsnt sspects of the environment by esr, from a knock st the door to the 
condition of an automobile engine and the gait of an approaching friend. Such 
naturalistic observstions were recently verified in experiments by VsnDerveer 
(Note 1, Note 2). She presented 30 recorded items of nstursl sound it) s free 
identlficstlon tssk snd found thst msny events such ss clspping, footsteps. 
Jingling keys, end tesrlng psper were identified with grester than 95* 
accuracy. Subjects tended to respond by naming a mechanical event that 
produced the sound, sn# reported their experiences in terms of sensory 
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qualities only when aouroa recognition ma not poealble. VanDervoor (Note 1} 
also found that oonfuaion errors in identification taaka and cluetering In 
aortlng taaka ttndad to troup aoouatio eventa by oomon temporal patterns. 
Por example, hnwHrinf via oonfuatd with calking, and tha aoratching of 
flngernalle was oonfuaad with filing, but hammering and walking ware not 
opnfuaad with tha lattar two eventa. 

Thaaa reaulta support tha ganaral olaia that aound in iaolation panaita 
aoourata -Idantlfloatlon of oiaaaaa of aound~produclng avanta whan tha temporal 
atruotura of o tha aound la fpaoifio to tha aaohanioal activity of tha aouroa 
(Gibeon, 1966b; Sohubart, 1974; Warran 4 Varbruggt, in preaa). If higher- 
ordar information ia found to ba apaoifio to avanta* whila valuta of lower- 
orVkr variablaa par aa ara not, than it My ba mora fruitful to viaw tha 
auditory system aa baing daaignad for tha ptrcaption of aouroa avanta (via 
higher-order aoouatio functiona), rather than for tha dataotion of quasi- 
atabla aound elements. Sohubart (1974) put thia eucoinotly in hia "Source 
Idantlfloatlon Principle* for auditory parcaption: "Identification of aound 
sources, and tha bahavior of thoaa aouroaa, la tha primary taak of tha 
tauditgry] system" (p. 126), 

Thi* ganaral parapaotiva on auditory paroaptlon la coning to ba oallad 
"ecological acouatlce," on a dlraot analogy to tha aoologioal optica advocated 
by Gibson (1961, 1966b) aa an approach to vision. Tha aoologioal approach 
laada to research that 'is similar in many respects to tha work summarized 
abova on apaaoh, music, and animal communication. In ganaral terms, tha 
strategy for raaaaroh la to Idantify tha higher-order properties that ara 
dafinad ovar tha oouraa of a natural aound -producing event, and than to *aa*aa 
tha ability of li'atenera to, utilize that potent!*! information. - A phyaioal 
analyaia of tha aouroa and ita bahavior la (an essential part of tha atratagy, 
both for idantifylng aoouatio variablaa that wight otharwiaa ba misaed, and 
for bounding tha aat of pbaaibla variablaa in a prinoiplad faahion. 
Furthermore, daaionat rating tha apaoifio! ty of aoouatio atruotura to tha aouroa 
event la crucial to avoid tha introduction of ad hoc processing principles to 
buttraaa parcaption (Shaw, Turvty, k Maoa, in press). r 

In addition to offering a raaaaroh atratagy, tac aoologioal approach 
aaaka a ganaral analyaia of avanta and a daacription of tha parcaptual 
infomation apaoifio to the*. Thia analyaia la baaad on tha observation that 
identifiable objeota participate in identifiable transformations or "styles of 
change 11 (Gibson, 1966a; Pittenger 4 Shaw, 1975; Shaw & Cutting, 1980; Shaw, 
Nolntyre, 4 Maoa, 1974; Shaw 4 Pittenger, 1978; Johansson, Hofaten, 4 Janason, 
1980). More preciaely, a class of objeota say be functionally defined in 
terns of structure that la preserved and deatroyed under certain transforma- 
tions. The information that specifies the jcind of object and ita properties 
under change la known at the atruotural Invariant of an event (Pittenger 4 
Shaw, 1975). Reciprocally, the information that apecifies the atyle of qhange 
la known aa the tranaformatlonal invariant, which may be deacribed Jointly in 
terss of the geometric properties that remain conatant and thoaa that vary 
lyatematically under change (Pittenger k Shaw, 1975; Nark, Todd, k Shaw, in 
Press). 

By suoh an analyaia, eventa can be organised into equivalence classes 
("types 19 ) that are defined by sets of transformational and structural invar i- 
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ants. Consider, for esaaple.'the style of change of walking and the. animals 
tilth appropriate liab structures, or the style of change of bur/ting and the 
object* that are combustible under terrestrial conditions. Within any equiva- 
lence olass of events, an 'indefinite nuafctr of particular instances ("tokens") 
are possible, each preserving the invariants of the class but individuated in 
spaoa amd tiae —that charging rhino, or this burning bridge. For any 
perceptible event, information about its class awe r ship is, by hypothesis, 
available by aeans of -the physical media it disturbs. In snslysls of such 
potential information and its relationship to the source event is s major goal 
of eoologioal physios. 

The present paper explores the acoustic aspects of dropping a glass 
object and ita subsequent bouncing or breaking. Bouncing and breaking are two 
distinct styles of change that may be wrought over a variety of objects 9 such 
as bott'as, plates, pottery, and other ceramics. These two events would be 
Identified by Gibson (1979t PP* 9*-95) as changes of the layout of surfaces 
due to phyalcal force— bouncing as a case of successive collisions, breaking 
as a compound event, of surface rupturing followed by successive 4 collisions 
(and possible further rupturing) of the broken pieces, The two styles of 
change constitute disjoint equivalence classes of events; *the breaking and 
the bouncing of semi-elastic objects. By acoustic and perce^Tual studies* of 
these events* we hope to discover the transformational invariants that 
distinguish them. (Structural invar" u s specifying individual properties of 
the* objects such as site, shape, and terisl, and individual transformation 
properties such as height of drop, forqe of impact, and angle. of impact, arc 
discussed in Warren & Verbrugge, in press.) 

Consider first the mechanical action of a bottle bouncing on a hard 
surface (see Figure 1a). Each collision consists of an initial Impact that 
briefly sets the bottle into vibration at a set of frequencies determined by 
it*vsize f shape, and material composition. This is reflected in the acoustic 
signal as an initial burst of noise followed by spectral energy concentrated 
at • particular set of overtone frequencies. Over a series of bounces, the 
collisions between object and gnound occur with declining impact force and 
decreasing ("damped") period, although some irregularities in the pattern may 
occur due to the asymmetry of the bottla. The spect^ components are similar 
*GroBn bounces, relative, overtone amplitudes varying slightly due to the 
varying Orientations ,of the' bottle at impact. (The spectrum within each pulse 
I> quasi-stable, and is conventionally described in terms of spectral peuks in 
a cross-section of the signal.) These acoustic consequences may be described 
as a single damped quasi-periodic pulse train in which the pulses share a 
similar cross-sectional spectfum (Figure 2a). It is this single pulse train 
th&t we suggest constitutes a transformational invariant of temporal pattern- 
ing for the bouncing style of change. 

Turning to the mechanical action of breaking (Figure 1b), it is evident 
that a catastrophic rupture occurs upon impact. Assuming an* idealized case, 
the resulting pieces then continue to bounce without further breakage, each 
with its own independent collision pattern. The acoustic consequences appear 
aa an initial rupture burst dissolving into over lapping multiple damped quasi- 
periodic pulse trains, each train having a different cross-sectional spectrum 
and damping characteristic (Figure 2b). We propose that a compound signal, 
consisting of % noise burst followed by such multiple pulse trains, consti- 
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tut** the trensfermstionsl invariant that specifies the oompoAja" event of 
breaking. • • <\ 

, \- ' 

Aside from these aspect, of temporal pattern and initial noise, certain 
crude spectral differences between breaking and bouncing o*n be observed by 
oom paring spectrogreas of natural cases (Figure 2). First, the fcvertpnes of 
breaking events are distributed across a wider range* of frequebcifes/thsn are 
those of bouncing events. Second, the overtones of breaking' are dedfir in the 
frequency domain. Both of these properties can be traced (o the\ontraat 
between a single object in vibration and a number of disparate- objects 
simultaneously in vibration. 

lhe following experiments test the hypothesis that temporal patterning, 
rather than some quasi-stable spectral property, distinguishes breaking from 
bouncing. By superimposing' recordings of individusl pieces of broken glasf, 
oases of breaking and bouncing can be constructed from s common set of pieces 
by varying the temporal correspondences among their collision pattern*. 
Experiment 1 establishes ^hat listeners can identify natural cases of breaking 
and bouncing with 'high accuracy. - Experiment 2 examines performance on 
constructed oases thst Include an initial breakage by t. end compares It with 
the results for natural sound. Finally, Experiment 3 assesses the contribu- 
tion of the burst by removing it from both natural and constructed cases of 
breaking. 

EXPERIMENT 1: NATURAL SOUND 



The first experiment determines whether natural soun<* provides sufficient x 
acoustic information for listeners to distinguish the events of breaking and* 
bouncing. 



Method , 

Materials . Natural recordings wart aade of three glass objects dropping 
onto a concrete floor covered by linoleua tile in a aound-attenufited rocs. 
Using ¥ a Crown 800 tape deck, the sound of each object was recordeo when the 
object was dropped froa a 1 ft, height (bouncing), and when it was dropped « 
frca a 2 to 5 ft. height (breaking). >Ihie yielded three tokens of bouncing 
and three tokens of breaking* The objects uaed and the durations of the 
bouncing (BBC) and breakipg (8RK) events are as follows: (1) 32 oz* jar: 
BMC1 * 1600 msec, 22 bounces; BR Ml * 1200 asec* (2) 64 02. bottle: 
BMC 2 s 1600 asec, 15 bounces f BRK * 550 asec/' (3) 1 litre . bottle: 
B.|C3 « 1300 asec, 17 bounces; BRI3 * 700 asec. The recordings were digitized 
at a 20 kHz saapllng rate u^ing the Pulse Cod* Modulation (PCM) syatea at 
Raskins Laboratories* A test tape was then recorded; it contained 20 trials 
of each natural tokw* in. randomized order for a total of 120 test trials* A 
pause of 3 sec occurred between trials, and a pause of 10 sec occurred after 
every six trials. 

^ c 

Subjects * Fifteen graduate! and undergraduate students participated Ln 
the eiperiaent for payaent or course credit. 

Procedure . Subjects wer$ run in groups of two to five*and listened to 
the tape binaurally through headphones. They were told thst they would be 



hearing recordings of objects that had either bounced or broken after being 
dropped, but were told nothing about the nature of the objects involved/ 
Their three-choice task was to identify each event as a case of breaking or 
bouncing, with a "don't know* option, by placing a check in the appropriate 
column on an answer sheet. The •don't know* category was included to minimize 
the possibility that subjects would choose one of the two event categories 
even when they found the sound unconvincing, as tney would be forced to do in 
a two-choice situation, They were specifically instructed to ignore th# 
nature of the object involved efid attend to " what's happening to 
it." Subjects received no practice trials or feedback. There was a shor£ 
break after 6$ trials, and a test session lasted about 20 Bin. 

Results and Discussion 

Overall performance on natural bouncing 1 tokens was 99*31 correct ("bounc- 
ing" judgments) f and on breaking tokens was 98, 5* correct ("breaking" judg- 
ments). "Don't know" responses accounted for 0.2% of all answers on bnmcing 
tokens and 0.7S on breaking tokens, Experiment 1 clearly demonstrates that 
sufficient information is present iff^the Nbcqustic signal to pefmit unpracticed 
listeners *o distinguish the events oT bouncing and breaking. 



EXPERIHEMT 2: foUISTBUCTSD SCUHD 

Experiment 2 attempted"' to aodel the time-varying information contained in 
natural recordings "by using constructed cases of bouncing and breaking, 
eliminating .average spectral differences between the two. 

Method 

Materials . Tokens intended to model bouncing and breaking were con- 
structed by the following method. Initially, individual recordings were made 
of four major pieces of glass from a broken bottle as each piece was dropped 
and bounced separately from a low height. These recordings were combined in 
two ways using the PCM system* 

To construct a bouncing token, the temporal pattern of each piece was 
adjusted to match a single master periodicity arbitrarily borrowed from a 
recording of a natural bouncing bottle (Figure 3a). This accomplished by 
inserting tape hiss between the bounce pulses in recordings of the individual 
piece?; After all four pieces had been adjusted so that their onsets matched 
the same pulse pattern, they were superimposed by summing the instantaneous 
amplitudes of the digitized recordings. - The result was a combined pulse 
pattern with synchronized onsets for all oounces. preserving the invariant oi 
a single duped quasi -per iodic pulse train to model bouncing (Figure 

A breaking token was constructed by readjusting the same four pieces to 
match four different temporal patterns (Figure 3b). As a first approximation, 
these* master patterns were borrowed from measurements of four different 
bouncing buttl.es. since the likely patterns of individual pieces of glass in 
the course of rttural breaking were unknown. These four, patterns were 
initiated simultaneously, preceded by 50 msec of noise burst taken from the 
original rupture. The result after superimposing these four independent 
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puis* trains; (•) bowwiof „ *ite synchronous puist onsets, (b) 
breaking, with* initial joIss burit and asynohronoua pulse onsets. 
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temporal series was a combined pattern with asynchronous pulse onsets, 
preserving the temporal invariant of multiple damped quasi-periodic, pulse 
trains to model breaking (Figure 4b). Note that the variables of temporal 
patterning and initial noise were confounded in this experiment. To the 
experimenters 1 ears the burst improved the quality of apparent breakage, but 
this assumption was later tested in Experiment 3. 

Hence* the only differences between bouncing and breaking tokens were in 
the temporal registration of pulse onsets and the presence (or absence) of 
Initial noise. The range and distribution of average spectral frequencies 
were similar in the two cases. Mean overall durations differed, averaging 
1107 msec for bouncing tokens and 733 msec for breaking tokens; in general 
this factor is related to object elasticity and the height of drop, and it is 
therefore not a likely candidate for* information specific to a style of 
change. 

There were certain problems with the constructed cases. The process of 
superimposing pulse patterns also summed taoe hiss and hum, sq that background 
noise was increased* Moreover, constructing the sound of a* single bouncing 
object by combining the spectral components of four independently bouncing 
pieces produced in one case a noise that sounded more like metal than glass 
material; nevertheless, the temporal invariant was preserved. The other two 
bouncing tokens sounded like glass. Finally, the use of only four pieces of 
glass to simulate breaking, the assumption that their periodicities were akin 
to those of a bouncing bottle, and the assumption of no further breakage after 
the initial catastrophe, were all- rather arbitrary idealizations. 
Nevertheless, if temporal patterning constitutes information for breaking and 
bouncing, subjects should be able to make reliable judgments of these tokens. 

Three cases of bouncing and three corresponding cases of breaking were 
produced by this method, each pair constructed from .a unique set of original 
pieces and matched to a unique se* of master periodicities. The original 
objects, and the durations of the bouncing or synchronous (SYN) and breaking 
or asvnchronous (ASYN) tokens constructed from their pieces, were bs follows J 
(1) 32 oz. jar: SYN1 = 1000 msec, 8 bounces; ASYN1 = 950 msec. (2) 32 
oz. jar: SYN2 = 1400 msec, 13 bounces; ASYN2 = 650 mec. (3) 64 oz. bottle. 
SYN3 = 920 msec, 9 bounces; ASYN3 = 600 msec. . 

Subjects . Fifteen graduate and undergraduate students participated in 
the experiment for payment or course credit. None of them had participated in 
l Experiment 1. 

i < 

Procedure . The procedure was the same as that in Experiment 1, with the 
exception that trials were presented in blocks of ten rather than blocks of 
six. Instructions to the subjects were the same, including the instruction to 
ignore object properties and concentrate on the style of change. 

Results and Discussion 

The results for each constructed token appear in Table 1, and are 
consistent r with the predictions of the temporal patterning hypothesis. 
Bouncing judgments, on synchronous tokens averaged 90.72, and breaking judg- 
ments on asynchronous tokens averaged 86.7* (these judgments being treated as 
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Tutjle 1 

Percent Correct Judgcsnts cn Constructed Tokens 



(Experiment 2) 



Token 



Bouncing 



Breaking 



/ 93.7 
08.7. 1.7) 



89.0 
(17.8, 2.9) 



01.3 
(18, 9. 2.0) 



-71.3 
(14.3, 1.4) , 



8*.3 
(16.9, 7h) 



99-7 - 
(19.9, 0.3) 



Overall 



90.7 



86.7 



Note: Mean scores und standard deviations are in parentheses. 
Scores are basei on 20 trials per subject per cell, N=15. 



"correct",) . "Don't know" answers accounted for 0.1$ of all responses on 
bouncing tokens, *nd 1.3* on breaking tokens. Considering the artificial 
nature of the constructed cases and the idealizations involved, their identic 
fiability nay be considered quite high. 

* • 
Sate departures fnpms the general pattern were found for token ASYN2, 
which showed a markedly lower level of correct performance (71.3S) • a higher 
„ standard deviation for "breaking" judgments, and a relatively high rate of 
"don't know" responses (* .0*). These differences were primarily due to the 
low* performance of five subjects who averaged 44J correct on this token, while 
the performance of the other ten averaged 85 .Of. It may be noted that the 
summed background noise was greater in ASYN2 than in the other two breaking 
cases. The fact that overall performance in this case was well above chance 
indicates that even the token of lowest identifiability contained sufficient 
information to distinguish the two events. 

It is not surprising that some tokens of constructed breaking are more 
convincing than others, as there are certainly some natural instance* that are 
more compelling than others. The diff arences , among tokens may involvesboth 
the spectral llihblnciiventss 6T ~the broken pieces and their Megree of 
asynchrony. In pilot tests, when the pulses from a single piece were adjusted 
to four different periodicities, the resulting sum of the four patterns did 
not specify breaking. Apparently, distinct spectral properties for each piece 
are necessary tp distinguish multiple pulse trains (see Figure 1b). The 
reciprocal bouncing case, in which successive pulses .were borrowed from 
different bottles, Similarly failed to yielcb af coherent bouncing event. 
Hence, spectral similarity across pulses appears to be necessary to specify 
the unity of a singl Impulse train. 

In general, performance with constructed sound was similar to 'that found 
for natural sound in Experiment 1. Although performance with constructed 
cases was somewhat lower ttpn with jnaturai cases, the differences ware only 
about' 10% on average, and performance with both natural and constructed cases 
was far above the chance level. The data permit us tt> conclude that temporal 
patterning is compelling information for breaking and bouncing. In other 
words, constructed and natural cases appear to specify the same general 
equivalence classes of breaking and bouncing events to a listener. 

EXPERIMENT INITIAL NOISE SPECIFIC TO RUPTURING 

To isolate the variable of single vs. multiple pulses and assess the 
importance of the initial noise burst in specifying breaking ( the firit two 
experiments were repeated with initial noise removed from both natural and 
constructed cases. Pildt work indioated that The first 80 msec of the signal 
in natural breaking and bouncing cases was not. in isolation, sufficient to 
distinguish the two events. Experiment 3 was conducted to determine whether 
the initial noise, in addition to. the pulse patterns, was necessary to 
distinguish breaking from bouncing. 
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Method 



Materials , Both natural and constructed tokens were prepared. Bouncing 
tokens were the esse as those used in the two previous experiments. For 
breaking cases, the constructed tokens fro* Experiment 2 were modified by 
removing the 50 msec of initial noise tfe.it had been added for that experiment • 
The natural breaking tokens from Experiment 1 were modified by r ^saving the 
naturally occurring burst, Since there was no distinct boundary in the 
natural waveform between the rupture burst and the aubacxpjtnl collision 
pulses,* the natural tokens ware edited by removing noise identifiable on an 
oscillogram and by listening for the absence of a buret. This technique 
resulted in the removal of the initial 80 msec from BRK1 , 50 msec from BRK2, 
and 60 msec from BRK3. . In sum, there were three tokens of bouncing and three 
tokens of breaking (without Initial noise) in both the natural and constructed 
conditions. 

Subjects. Thirty graduate end undergraduate students participated in the 
experiment ror payment. None had participated In the previous experiment. 

Procedure . The natural and constructed conditions were run separately 
with two different groups of 15 subjects. The procedure and instructions wire 
the same as before, with esch group receiving 120 randomly ordered trials in 
blocks of six. 

Results and Discussion 

The results for each token appear in Table 2. VI th natural cases, the 
overall performance was 99*89 correct c bouncing tokens and 96.01 correct on 
breaking tokens: with the constructed cases it was. 93*OJ for bouncing and 
^ 86.7* for breaking. These resets were nearly identical to those of Experi- 
ment 1 with natural sound and Experiment 2 with constructed sound. "Don't 
know* answers accounted for 0.0S of all responses on natural bouncing tokens, 
1.01 on natural breaking tokens, 2. OS on constructed bouncing tokens, And 4. OS 
on constructed breaking tokens. 

Hence, removal of initial noise from breaking tokens does not reduce 
their dlscrimlnability. Finding this result for the natural cases indicates 
that the "burst is not necessary to distinguish the two events. The same 
rinding with constructed cases demonstrates that variation in the temporal 
patterning of pulse onsets is alone sufficient to discriminate breaking and 
bouncing. 

However, we may question whether pulse patterning alone is sufficient to 
specify a breaking event in isolation. Following the test sessions, s number 
of subjects in Experiment 3 reported that natural and constructed breaking 
cases without initial noise often, provided weak instances of the event, some 
sounding more like "wind chimes,* "bolls,* "spoons dropping," or "ice cubes in 
a glass"— in othef words, like multiple collisions of initially independent 
objects* Others reported precisely what was presented: "pieces falling after 
the break, without an initial crash." Although the acoustic structure was 
sufficient to distinguish the event of breaking from tnst of bouncing* and not 
ambiguous enough to merit a "don*t know," it could nevertheless specify wind 
chimes, not breaking glass, when heard in isolation* Since breaking is a 
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labia 2 

Paroant Corraot Judfaanta on Natural and Conitruotad Token* 
Without Initial toia* (Expariaant 3) 



Natural 



Conatruotad 



Tojeaa Bounolng 



Braaklni 



Bounolng 



Braaklna 



99*7 
(19.9. 0.3) 



93.7 
(18.7. 2.*) 



9«.0 
(18.8, 2.2) 



83.7 
(16.7,. 2.9) 



2 



99.7 97.7 97.3 76.7 

(19.9. 0.3) (19.5, 1.6) (19.5r 0.9) (15.3, 4.1) 



100.0 
(20.0, 0.0) 



96.7 
(19-3, 2.3) 



87.7 
(17.5, 2.7) 



99.7 
*19.9, 0.3) 



OvaralX 



99.8 



96.0 



93.D 



86.7 



•ot# : * Mtae aeoraa »d ataadard daviationa art in par 
ba**d on 20 trial* jfcr aObJaot par **u. N*15 Id tba 
■ais In Ua CooatrwUd oooditioa. 



aaa. Sooraa art 

ral condition and 
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compound event, it is not surprising that the oauaal transit ion from one to 
many pieces must be apeolfled by an Initial rupture noise. 

In general, theae observations are oonalatent with our original hypo- 
thesis that_ breaking la apeolfled by a complex aoouatlo configuration < 
consisting of sa lnitisl noiae followed by multiple qussl-pcriodio pulae 
trains* Farther work reweins to be done to determine whether the initial 
noiae la neoesssry for identifying breakage under conditions less oonstrslned 
than in the present experiment. y 



CEHEBA L DISCUS3I0W 

The preoedlng experiments have attempted to determine whether - higher- 
order, time-varying properties constitute effective sooustio information for 
the events of bouooing and breaking. Tnfc results show that differences in the 
temporal 1 patterning of component pulfce onsets ere sufficient to peroeptuslly, 
distinguish the two events,, with or without an initial buret. Theae temporal 
invariants override any contribution of average spectrsl properties in distin- 
guishing the events. The- resultf provide evidence that oertsln damped 
periodic patterns; plus lnitisl noise, constitute transformations! lnvsrisnts 
that spool fy breaking and bouncing to a llatener. 

However, if theae temporal pstterns are to convey the distinct events of 
breaking and bouncing, they must be carried by slgnsls with certain apectral 
propertiea. Specifically, a aingle damped quaai-pe^lodic pulse tr sin must be 
of constant resonance if It is to cohere es the bouncing of s single object, 
leolprocslly, multiple damped qussi-periodid pulae trains must hsve different 
frequency spectra if they are to separate perceptually aa indcigpdcntly 
bouncing shards, which together specify the breaking of an object into pieoos. 
Hence i a- ocmbinstioh of temporal and spectrsl patterns constitutes the 
informstlon necessary and sufficient to specify bresklng end bounolng. 

♦ * 

The amplitude and periodicity requirements of such patterns In bouncing 
events were conaldered in two aimple demonatratlons worth mentioning here. 
Iter sting s. recording of one bounce pulse to match the timing of s nstursl 
bouncing sequenoe produced s cleer bouncing event, el though the uausl declin- 
ing amplitude gredlent waa absent. However, adjusting the pulse pattern to 
creete equsl 100 msec intervsls between pulse onsets, thereby ellminstlng the 
damping of the periodic pattern, destroyed the effect of perceived bounolng. 
The rapid ataocato sound waa like that produced by a negentroplc machine, such 
ss s Jaokhammer. A damped series of collisions, ss oonstrslned by grsvity snd 
the imperfect elsstlolty of the system, sppesrs necessary to the information 
for bouncing. Experiments are in progress to sssess the ef fleecy of period 
damping in specifying elsstlolty or "bounclness* itself. 

The experiments exemplify an ecological approach to auditory perception, 
seeking to identify higher-order sooUstlc Informstlon for complex events. The 
scoustlc consequences of two distinct mechenlcsl events were analyzed for 
their temporal and spectrsl structure, snd the invariant properties sufficient 
to convey aspects of the events to a listener were emplrlcslly determined. 
SUoh work is preliminary to modeling auditory mechanisms capable of detecting 
these invar tents (see Mace, 1977). 
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SPEECH AMD SIGN: SOME COMMENTS FROM THE EVENT PERSPECTIVE. 

REPORT FOR *HE LANGUAGE WORK GROUP OF THE FIRST INTERNATIONAL CONFERENCE ON 
EVENy PERCEPTION. 1 

Carol Fowler* and Brad Rakerd++ ' 



Signed and spoken utterances have at least two aspects" that are of 
Interest to a peroelver. First of all, they have a physical aspect, tfie 
significance of which is given in the lawful relations aaong utterances, the 
inforastion-bearing aedia structured by then, and the perceptual systeas of 
observers and listeners. Secondly, they have a linguistic aspect, the 
significance of which is given in the conventional or ruleful relations 
between fonts and meaning. 1 In part because our tiae was Halted, and in pact 
because so little work ' has been done on the conventional significance of 
events (as opposed to the intrinsic significance [cf. Gibson, 1966] 2 ), our 
work group chose to focus on the physical aspeot. Nevertheless, it will be 
aeeir^tbjt we did have a speculative word or two to amy about the origins of 
soae linguistic conventions, and we would draw attention to the report of the 
Event/Cognition group, as well as to Verbrugge's reaarks (discussant for the 
address by Studdert-Kennedy) , for aore elaborate treatments of this iaportant 
topic. 

Roughly, our daily discussions centered around five topic areas: (1) 
useful descriptions of signed and spoken events; (2) natural constraints on 
linguistic fora; (3) the origins of soae linguistic conventions; (4) the 
ecology of conversation; and (5) conducting language research froa an event 
perspective. Our review of these topics will highlight what seeaed to us to 
be the obvious applications of the event approach and also its apparent 
limitations. 



USEFUL DESCRIPTIONS OF SIGNED AND SPOKEN EVENTS 

We considered the alnlaal linguistic event to be ah utterance, and 
identified as nyoh anything that a talker (signer) aitfht choose to say (sign). 
Obviously, this definition is unsatisfactory on a nuaber of grounds 4 ; however, 
it does Identify the alnlaal event of interest as being articulaWy (gestur- 
al) in origin, and rejects as Irrelevant those properties of articulation 



ERLC 



•The conferewfr was held June 7-12, 1981, in Storrs, Connecticut. The 
participants in the Language Work Grouo were Hollis Fitch, Carol Fowler, 
Nancy Frishberg, Kerry Green, Harlan <Lane, Hark MandaU, Brad Rakerd, Robert 
Reaez, PhlfeLp Rabin, Judy Shepard-Kegl , Winifred Strange, Michael Studdert- 
Kennedy, Betty Tuller, and Jerry ZiLwraan. 

♦Also Dartmouth Collage. \ , 

++Alao University of Connecticut. v 
Aoknowladgiacrit . This work was supported by tilCHD grant HD-01994 to Kaskins 
Laboratories. 

CHASKINS LABORATORIES: Status Report on Speech Research SR-67/68 (1981)] 

241 



243 



'(gesture) that art not Intended to h*.* linguistic significance. We first 
attempted to verify that utteranoea have the "nested" ohcraotar of other 
ecological event a and that the neatinga are peroelved; next we considered how 
to discover the aost useful characterization of utteranoea for .the investiga- 
tore' purposes of studying thea as perceived events. ' 



Signing and Speaking aa Moated Eventa 

Natural eventa are nested in the sense that relatively slower* longer- 
term or aore global eventa are ooapoaed of relatively faster, shorter-tens or 
•ore local ones. For exaaple, a football gaae la a longer-term event cob posed 
of shooter-tern playa. It ia olear fro* research— particularly Johansson's 
(e.g., 1973, 1975) on the peroeptloo of font and aotion In point-light 
displays— that viewers are sensitive 'to the' nested struoture of events. In 
his addreaa to thla oonferenoe, Johanason desoribed an exaaple of light points 
placed on a rolling wheel. Whan a single point la affixed to the ria, a 
viewer who aeea only that point geta no sense of the wheel's aotion; instead, 
the peroept is of a* light moving in a oyoloid pattern. However, when a second 
light is attached, now. to the hub of the wheel, the viewer perceives rolling 
instead of the oyoloid aotion. Thus, two appropriately placed lights provide 
sufficient optioal information to specify the distal event of rolling. 

In geometric terms, rolling involves two kinds of aotion: tranalatory 
and rotary . These are temporally nested; a series of rotations occurs as the 
wheel translates over the ground plane. The tranalatory component affects the 
behavior of both light points (sinoe both are attached to the translating 
wheel),- but only the point on the ria la affected by the rotary component as 
well (sinoe it rotates about the point on the hub). ' Apparently, perceptual 
sensitivity to the translation (as specified by the correlated aotivity of the 
two lights) foias a sort of •backdrop" for detection of the rotation; in 
essence, the translations! component la "factored out" of the oyoloid movement 
of the rim light, thereby revealing its rotational component. 

Mow let ua consider whether these observations apply to signing and its 
perception. In American Sign Language (ASL), signs are specified by three 
properties: the shape of the hand or hands, the place of articulation of the 
sign within a signing apace, and the movement of the hand or hands. Signs can 
be infleoted by modulating the movement. For exaaple, a 'distributional* 
inflection indicating that all of the individuals under discussion are 
affected by some sot is produoed by sweeping the arm through the central body 
plane. By signing, say, GIVE while- making suoh an arm sweep the signer 
communicates GIVE TO ALL OP THEM. Likewise, a •temporal' inflection, one 
Indicating the repeated occurrence of an act, la produoed by rotating the 
wrist about a body-centralised point; with this gesture, GIVE is modified to, 
aean GIVE AGAIN AND AGAIN. 

Finally, and most importantly for the current discussion, several Inflec- 
tions can be super la posed. Carrying our previous example a step 'further, it 
proves possible to sign the complexly inflected verb GIVE TO ALL OF THEM AGAIN 
AND AGAIN. This is accomplished by rotating the wrist while the arm sweeps 
through its arc. Notice that when this is done, the optioal information for 
the 'temporal' inflection undergoes s radical transformation; the* wrist no 
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longer rotates aboj^a single point fi#j| at the center of tht body, but 
ratter about a point aoving with tee- ewemping am. It appears that observers < 
.treat the sweeping aotioo (oomaon to all points of the hand, wrist, and am) 
aa both specifying one signed event (the • distributional 1 inflection), and as 
providing a sioving fraae of referenoe for the Interpretation of the nested 
'teaporal' Inflection. v 

. ' . /~\ 

Spoken language, with its aynteotlc units— phonologloal segments, nor- 
pbenea, words, and ayhtadtlo phrases— abd Its Metrical units— syllables, feet, 
phonologloal phraaes (see Selkirk, 1980 )*-lenda Itself readily to the ohar- 
acteri ration "nested." Ha will take an example of nested articulator y~ind 
perceived events froa a relatively low-level phenomenon, ooartloulatlon, In 
fluent speech . the produotiona of suooesslve phonetio segments overlap such 
that the articulatory gestures often satisfy requirements for two or acre 
aagmenta at the same time. Typically, for example, unstressed vowels ooarti- 
oulate with the atreased vowela of adjacent syllabi.*. It is therefore 
tempting to think of the produotlon of the unstressed vowels as being nested 
within that of their .tressed counterparts, and to think of unstressed vowels ^ 
aa being perceived relative to their atreased-vowel context. This way of 
thinking la promoted by findings (Fowler, 1981) that un-.r some, conditions 
listeners behave aa if they have ■f a cto re d ou t * the a r tieula ee ry/ecousbic — 
contributions of the oontext when Judging the quality of unstressed vowel*— 
■ore or leas aa Johansson's subjects seem to have factored out coamon and 
relative motions in an optical display. 

In trisyllabic nonsense words with medial /ay, the aedlal vowel coartleu- 
latea with both of its flanking atreaaed vowela such that the F2 of /•/ in, 
for instance, /ibabi/ is higher than it la in /ubebu/. (Compatibly, F2 is 
high for /V and low for /u/.) when extracted froa their oonUxts, the medial 
/**/ syllable* do sound quite different, but when presented in context they 
sound ali ke m or e alike, in fact, than do two acoustically identical /be)/ 
syllables presented in different contexts. 

A nested-events account of these data would bold that when the /be/ 
syllables are extracted from the context in which they bad been produced, the 
peroeiver has ro way to detect (factor out) the contribution that the stressed 
vowels have made to that -portion of the acoustic signal in which m/ 
correlates predominate over the correlates of other segments— no more than 
Johansson's subjects can separate the rotary from the translator y components 
of movements when they see just the ono light on the rim of a wheel. 
Presentation in the oontext of flanking vowels, on fchv other hand, allows the 
peroeiver to factor out components in common with those vowela, and to 
recognise the quality of what la left. This leads to the perceived identity 
ot the aecjatlcally "different* /beV syllables (in the /ibebi/ and /ubebu/ 
contexts), and to the perceived difference of the acoustically "identical" 
ayllablea (in the different triayliable oontexta). 

Identifying Speech F. vent a: The Problem of Peeorlption 

Several theories of speech perception— including Gibsdp'a c 1966J and one 
more familiar to speech investigators, the motor theor* (e„g», uoerean, 
Cooper, Shankweiler, * Studdert-tenaedy, 1967)— adopt a vieV consistent vith 
an event perspective: namely, that tne perceived categories of speech *r* 
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articulator* la origin. Gibson's iriiv is distinguished froa the other by its 
working assumption thst ths peroeived srtioulstory categories . srs full/ 
reflected (however' ooaplexly) in ths acoustic signal and hence need not he 
reconstructed .by srtioulstory siaulstions. What srs ths reasons v* 0 r This/ 
major diaagrsssisnt among theorists ism agree on the question of what is 
peroeived? One reason* asy be that they differ in teres of how they describe 
the aoouatlo signal or even the srtioulstory event. 

In speech, srtioulstory activities and their aqoustlo eorrsistss srs both 
richly structured, and ooassqueatly can be described in s greet asny different 
Mays. Each of ths various descriptions asy be eost appropriate for oar tain 
purposes, but none is privileged for all purposes., and just one or s feu ere 
privileged for the nv*poses of understanding Uhat a talker is doing sod why s 
listener perceives A*t he or she perceives. A theorist who is convinced that 
the acoustic support for perceptual categories is inadequate asy be correct; 
but, si tentatively, she or he asy have selected e description of srtioulstory 
events and their acoustic correlates that fails to reveal the support. 

there ere asny reasons *-Uy s psrticulsr description sight be inappropri- 
ate for siding our understanding of speech perception and production, It 
o e u l d sp se lfy ea eesa lv s d oses* (ss wh e n, t.v P u tn aa ' s tl973l - *xa a pls. i n form s- 
tion about the positions- and velocities of the eleaentsry particles of s peg 
and psgbosrd srs invoked to explain vhy s square peg wor"t fit in a round 
hole). Or, for any level of detail, it could be inappropriate because it 
clsssifiss oaapooents in ways' that fsll to cspture the tsiv^s organisation 
of them or the listener* a perbelved organisation. Appropriate descriptions of 
voesl activity during speech, then, aust capture the organitstion iaposed by 
the talker*- those of the acoustic signal must cspture those acoustic reflec- 
tions of the articulator organisation that are responsible for the listener's 
perception of it. 

appropriate descriptions of perceived srtioulstory categories . In soae 
tiae frase, - talker eight be said to have raised his Isrynx (thereby 
decreasing the vol use of the oral cavity abducted the vocal fqids, increased 
their stiffness, closed the lips, and -sised the body of the tongue toward the 
palate. This description lisW a set of apparently separate articulatory 
acta. In fact, however, the first three of thea have the Joint effect of 
achieving vo!o#iessoess; these and the nest, lip closure, are the principal 
coaponents of /p/ articulation; and all five acts together are essential to 
tfte production of the syllable /pi/* Thus, the aggregate of occurrences in 
this tiae fraae have a coordinated structure of relations soaething like the 
following; {(.(larynx raising, vocal co«-o abduction, vcsal cord stiffening) (lip 
closure) H tongue-body gesture] ). _ 

If an investigator settles for the first descnption-.-a list of the 
activities of individual articulators— then, free hia perspective, tnforaatto« 
about tne phonetic segaents of an utterance is already absent end ne cannot 
expect to find any evidence of it m the acoustic signal, Consequently, when 
a pereeiver recovers segaents in speech, the recovery aust be considered 
reconstructive, Befor settling for this conclusion, hovever, the investiga- 
tor can try standing hack a little fro* His first perspective on the vocal 
tract activity and looking for organisations aeoftg gestures that were not 
initially apparent. These ergaoitations will only be revealed f^os a temporal 
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pcrapaotlve broad aaougb tb*t oouplad obaogaa aaoag Use coordinated atruoturea 
oar ba abaarvad. Certainly if tbe^ ara ooordioative articulators relations 
aaorf feature* and if tba reRtlooa have acoustic reflection, u*eo the 
JLlatanar it likely to ba eeneitive to tb* coordinated atruswre, ratbar than 
to tba unatruetured Uat of seaturca fro* which it it built, for by detecting 
tba afruoture of tba relatl *>e oon, thaaa gvsturts, tba llateoer detects the 
talker 1 a ■truoture— ter« tb» fcattral and phonetic aegvental structure of the 
utterance— uhloh la tdtat aba 'or ba suet do if tba utterance is 19 be 

In support of this general approach to phonetic perception* the™ 19 soae 
evidence that listeners <k> perceive aggregates of articulator* ecis as if 
tfcose acta were ooordlnatad segmental structures. One ex w pit of this 
involves the perception of void tig. Following the release of s voiceless stop 
consonant, the fundamental frequency- (f 0 > 0 f the Voice is relativity high end 
falls (Rails A Stevens. 1971; Bosbert, 1978; Obals, 1979). Following a voiced 
stop, f 0 i M 1<W Mid risesv Although the reasons for this differential 
patterning of f~ are not fully understood (toebert, 1973; Hcabert, Qhala, & 
1979) • It la general* r agreed that it results fro* the Using of certain 
laryngeal adjt^taimts and fron certain aerodynaaio condition- that ^e talker 
eatsbllshea iv Maintaining voicelessness or voicing during the production of 
the consonant <ef» Abraa^on 4 Lisaer, 19§0#. ' that is, the talker does not 
P'm to produce a hl*h falling f n oonfcur following release of i /p/, 
tasuu # p^ans to oaiatain voicelessness of the consonant end an unintended 
consequence cf that effort is a pitch perturbation following release. 
Compatibly, listeners do not normally hear this pitch difference as such (thbt 
is, the- 4o not notice a higher pitched vo*ael following /p/ t *an 'b/) ♦ 
Instead, m the context of a preceding stop, a high falling f Q contour in a 
vowel nay serve as information for volceiessness of a preceding >*onsontmt 
(Haggard • Aebler, 4 Callow, 1970; Fujiaira, 1971) . even though, when renoved 
fr« the conponaotai context, the fr contours are perceived as pitch changes 
(Hoftbert, 1978; Hubert et al** 1979>« 

Also suggestive of tha perceptual et traction of coordinated articu^tt9ry 
structures are occasions when the parcel ver Sfess tc be aisled. Ohala (197** 
in 'press) believes that cerfcatn historical sound changes can be explained as 
results of listeners' having failed to rsccfnlxe scae unplanned arueulatory 
consequence as -^planned. An ^aaple related to t^e first one is the 
development of distinctive tones in certain languages. These languages 
evolved Tree Mr liar versions without tone systesas, but with distinctions in 
voiokig ' between pairs of consonants* Over ti»e, the t Q difference Jpst 
deaenbed between syllables differing in initial stop voicing becaae eragger* 
Ated and the voicing distinction was lost, Ohala 1 a interpretation of the 
source of the change H that in these languages 14*w ners tended to hear ^tne 
f 0 diffeqpf^es on che post~conso*antei vowels As if pitch had been a 
controlled articulate ry variable* rather than an isicofjtrolied consequence of 
adjustments related to vc ring. Therefore t *fcen these individuals produced 
the vow*>«* Utey generated controlled (and larger) differences in f 0 of voiced 
and voiceless afibp^Utlal syllables* Eventually, because the f 0 differences 
had b*co»e higniy dirtT^jtive. Use now redundant vc ,cing distinction was lost 
and U*e wof^e that fo^wly had differed in"* voicing of the initial consonant 
now differed in tone. According to Ohala, *his process occurred during the 
separation of Punjabi froa Kinoi* 



Appropriate descriptions of the acoustic s ignal * Because very little is 
known about bow a talker organizes articulation, descriptions of the acoustic 
signal useful for purposes of understanding perception cannot be guided 
strongly by information about articulator y categories. However, we do know 
enough to recognize that the usual method of partitioning the acoustic Signal 
Into segments or into *cueV can be improved on. Such partitioning often 
obmcurea the existence of information for the phonetic segmental structure of 
sp#ech because the structure of measured acoustic segments Is not coextensive 
with the pbonetic structure of the utterance. For one thing, phonetic 
segments as produced have a time course that measured acoustic segments do not 
reflect. The component articalatory gescu^es of a phonetic, segment gradually 
increase In relativ, ^-^mlnence over the residual gestures for a preceding 
segment and consequently Uit acoustic . signal gradually comes to reflect the 
articulator/ character of the new segment more strongly ~han that of the old 
one. Thus, phonetic s ats are not discrete on the time ails, although they 
can be identified as .«ally separate and serially ordered by tracking the 
waxing and waning of their predominance in the acoustic signal (cf. Fant, 
1960). 

Acoustic segments, on the other hand, are discrete. (Such segments nre 
stretches of the acoustic signal bouided by abrupt changes in spectral 
composition.) An individual acoustic segment spans far less than all of the 
acoustic correlates of a phonetic segment and, in general, it reflects the 
overlapping production of several phonetic segments (cf. Fant, 1960). Looking 
at the signal as a series of discrete acoustic .segments, then, obscures 
another way of looking at it: as a reflection of a series of overlapping 
pnonetic segments successively increasing and declining in prominence. 

Partitioning acoustic signals into acoustic segments also promotes as- 
signing separate status to different acoustic "cues" for a phonetic feature, 
even though such ah assignment tends to violate the articuiatory fact that 
many of these cues, no matter how distinct their acoustic properties may be, 
are inseparable acoustic products of the gestures for a sifh^le phonetic 
segment (Lisker & Abranaon, 1964; Abramson* & Lisker, 1965). rhe findings oi 
"trading relations" among acoustically distinctive parts of the speech signal 
indicate that these cues are not separable . for perceivers any more than they 
can be tor talkers. For example, certain pairs of syllables differing on two 
distinct acoustic dimensions— the duration of a silent interval following 
frlcation noise and the presence or absence of form nt transitions into the 
following vocalic segment—are indistinguishable by listeners (Fitch, Halwes, 
Erickaon, 4 Liberman, 1980). Within limits, a syllable with a long silent 
interval and no transitions sounds the same as one with a short silent 
Interval and transitions. It is as if the transitions in 'the second syllable 
are indistinguishable from the extra silence *n. the first. A perceptual 
theory in which thi* observation is natural and expected is difficult to 
imagine—unless the theory recognizes that detecting acoustic segments per se 
Is rot all there Is to perceiving speech. We would argue that the cues in 
these istimul are indistinguishable to the degree that they provide informa- 
tion about the saae articulatory events Thus, 24^sec of silence "trades" 
with the forman*. transitions because both cues specify production of /p/. It 
is our view that source-free descriptions of acoustics' will never succeed in 
capturing what a speech event sounds like to a perceiver, because it is 
information carried in the signal, not the signal itself, that sounds like 
sometning . 
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NATURAL CONSTRAINTS ON LANGUAGE FORM 



Shifting perspectives from ongoing articulation and its reflections in 
proximal stimulation, we considered how, over th# long term, properties of the 
articulators in speech or of the limbs in sign may have shaped linguistic 
Apnea. Similarly, we considered how perceptual systems and acoustic or 
optical media, with their differential tendencies to be structured by various 
properties of distal events, may have shaped the terms of sign and speech. 

Sign has several regular properties suggestive of natural constraints on 
manual-language forms. One (Battison, 197M; cited in Siple, 1978, and Klima & 
Bellugi, 1979) takes the form of a symmetry constraint on two-handed signs: 
if both hands move in the production of a sign, the shapes and movements of 
the two hands must be the same and symmetrical. This constraint is compatible 
with anecdotal evidence (from novice piano players, for example) , and more 
recently with experimental evidence (Kelso, Southard, ^ Goodman, 1979; Kelso, 
Holt, Rubin, & Kugler, in Rress) that it is difficult to engage in different 
activities with the two hands. One reason for this may be a tendency for 
actors to reduce the nunber of independently con * celled degrees of freedom in 
complex! tasks by organizing structures coordihatively (e.g., Turvey, 1977). 
Kelson experim fcs suggest that the two arms and hands tend to i>e organized 
coordinatively even 'when such an organ^ffc* ; ion would seem unnecessary or even 
undesirable (Kelso et al., 1979; Kelso et al.,'in press); when subjects were 
required to engage in different activities witih the two hands or arms, the 
"different" movements tended to retain similar properties. 

A second constraint, called the "Dominance" constraint by Battison, may 
have similaKorigin in general constraints on movement organization. For 
signs in which just qne hand moves and the other hand selves as a base for the 
movements (a place of articulation), the base hand must either have the same 
configuration as the moving hand or <ine of a very limited set of other 
configurations. 

An example of a constraint in spoken languages may be the tendency for 
syllable structures to respect a "sonority hierarchy" (e.g., Kiparsky, 1979) 
whereby sonority (roughly, vowel-likeness) increases inward toward the vowel 
from both syllable edges. Hence,, for example, /tr/, a sequence in . which 
sopority increases from left to right, is an acceptable prevocalic sequence, 
but postvocalically the order must be /rt/. 

As for language, features owfng to properties of perceptual systems and 
stimulating media, Lindblom 1 s proposed .constraints on the evolution of vowel 
systems prdvld& an example in spoken languages (1980; see also, Bladon & 
Lindblom, 1981 ) • Lindblom has proposed that vowel systems maximize the 
perceptual distances among member vowels. Based on estimates of distances 
among vowels in perceptual space, he succeeds predicting which vowels will 
tend 4 to occur across languages* in vowel systems of various sizes. This 
implies a constraint on phonological inventories that perceiver* be able io 
recover distinct phonetic segments when distinct ones are intended. Talkers 
cannot elect to realize distinct phonetic segments by using articulatory 
gestures (however distinct they may be themselves) that fail to leave 
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distinguishing traces in the acoustic medium or In the neural medium of 
perceptual systems. (Analogous articulatory constraints also operate to shape 
vowel systems* Thus, the relatively densely populated frortt vowel spaoe and 
the sparsely populated back vowel ipace doubtless reflect the relatively 
greate * agility and precision of movement of the tongue tip and blade compared 
with the tongue body.) 

Lane proposed that similar perceptual an<K^rticulatory constraints may 
shape the evolution of sign inventories, facial ex pre: 'ons provide informa- 
tion in ASL and perceivers tend to focus on a signer*^ ice. This creates a 
gradient of acuity peaking at the face. According Wsiple (1978), signs made 
w*ll away from the face tend to be less similar one to the other than signs 
made in its vicinity; in auditibn, two handed signs made in the periphery are 
subject to the Symmetry and Dominance constraints just described, which 
provide redundancy for the viewer who may not see them as clearly ^s signs 
produced near the face. Lane suggested that the relative frequency of signs 
in various locations* in signing space might be predicted jointly by the acuity 
gradient favoring signs located near the face and a work-minimizing constraint 
favoring signs closer to waist level. 

THE ORIGINS OF SOME LINGUISTIC CONVENTIONS 

As we noted earlier, the conventional rather than necessary relationship 
between linguistic forms and Jtheir message function is central to the nature 
of language, freeing linguistic messages from having to refer to the here and 
now, and thereby allowing past, future t fictional and hypothetical events all 
to be discussed. For Gibson, this property of language removes it from the 
class of things that can be directly perceived i3 

[Perceptual cognition] is a direct response to things based on 
stimulus information; [symbolic cognition] is an indirect response 
to things based on stimulus sources produced by another human 
individual*. The information in the latter is coded ; in the former 
s case it cannot properly be called that (1966, p. 91). 

The study group did not discuss language comprehension in relation to 
event theory, perhaps because event theory currently offers little guidance on 
that subject* Howevir, there was discussion of the origins of some linguistic 
conventions. Several examples suggest an , origin of certain conventional 
relations as elaborations of intrinsic ones. The example of tonogenesis given 
earlier illustrates this idea. Ohala proposes that in some languages distinc- 
tive tones originated as controlled exaggerations of the pitch perturbations 
on vowels caused by the voicing or voicelesaness of a preceding consonant. 

A second example is so-called "compensatory lengthening" (e.g., Grundt, 
1976; Ingria, 1979) — a historical change whereby languages concurrently lost a 
final consonani in some words and gained a ^phonological distinction of #owel 
length, with tne words that formerly had endfeti in a consonant now ending in a 
phonological ly long vowel. In spoken languages, the measured length of vowels 
shortens when they are spoken before consonants (e.g. Lindblom , Lyberg , & 
Holmgren, 1981)., Of course, since vowels coarticulate with final consonants, 
this measured shortening may not reflect "true 1 * shortening; presumably, 
acoustic evidence of their coarticulating cd£es is obscured by acoustic 
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correlates of the overlaid consonant, in any case, tjhe loss of a final 
consonant leads to. Measured lengthening .of the vowsl. if that unintended 
lengthening was perceived as controlled lengthening (Just as. hypothetically, 
uncontrolled pitoh perturbations were perceived as controllc i pitch oontours) , 
and was subsequently produced as a controlled lengthening, it could serve as 
the basis for a phonological distinction in vowel length. 

A final example in speech apparently has an analogue in sign. Some 
speech production investigators have proposed that vowels and consonants are 
produced by relatively separate articulatory organizations in the vocal tract, 
and that vowel production may go on essentially continuously during speech 
production, uninterrupted by concurrently produoed consonants (e.g. Ohman, 
1966; Perkell, 1969; Fowler, 1980). These proposals are based on observations 
that vowel-to- vowel gestures that occur during consonant production (9hman, 
1966; Perkell, 19J69) sometimes look vecy similar to vowel-to-vowel gestures in 
W sequences (Kent & Holl, 1972). also, a relatively separate organization of 
vowel and consonant production with continuous production of vowels aay 
promote such linguistic conventions as vowel infixing in consonantal root* in 
Arabic languages (McCarthy, 1981) and vowel harmony in languages including 
Turkish (and in infant babbling [e.g., Nenn, 19803).. 

Vowel infixing will provide an illustration. In Arabic languages, verb 
roots are tri consonantal . For example, the root 'ktb' means "write." Verb 
voice «nd aspect (e.g., active/passive, perfective/imperfective) are indicated 
by morphemes consisting entirely of vowels. In McCarthy's recen. analysis 
(1981), the consonantal roots and vowel morphemes are interleaved acoordxng to 
specifications of a limited number of word templates and a small number of 
principles for assigning ||e oomionent segments to the templates. Some 
derivationally related words" in Arabic are: katab . ktabab , kutib, and kuutlb . 
The consonantal root in each case is 'ktb f ; the vowel morphemes are 'a' 
(perfective, active) and 'ui 1 (perfeotive, passive); and the relevant word 
templates are CVCVC, CCVCVC, CWCVC (where C is a consonant and V is a vowel). 
The general rules for assigning roots and morphemes to templates are (1) to 
assign the component segments left to right in the template, and (2) if there 
are more C slots .than consonants or more V slots than vowels, to spread the 
last consonant or last vowel over the remaining C or V slots. The only 
exception to this generalization is in »ui», which is always assigned to 
the right-most V in the template. Below are two illustrations of verb 
formation according to this analysis: 



ec£cve--*» kiaUb cvvcvc— ► fcuwiib v 

fmgl discussed an analogous system in ASL. A particular root morphexee 
can oe associated with different sign templates to express derivationally or 
inflectionally related versions of the morph<we. The templates have slots for 
locations (I) and movements where the former specify person and number 

and the latter specify aspec;. To take an example, tne template that 
underlies I GtVE TP HIM is <LK ♦ ML). Movements and locations are assignee to 
it as in McCarthy ^analysis: 

t 
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A template oen_Jholude several L'» ami M' s w or t, in fact, than there are 
distinct aoveaeat* in • root mnr pmanm . In this out, too bqvmmqU of the 
root aorphmee art assigned .left to -right io tba template until they art 
eihausted, and than tba right-moat movemedt spreads to fill tba empty M alota. 
In I GIVE TO X, I, AMD Z tba template and aeelgnments of root aorpbaaa 
Bovemeats ara aa follow: 



LN ♦ ML Jfc^ - ML 



aovaj 

Analyzed this way, tba meshing of movements and locations ia aimllar to 
tba Mahinf of vowels and consonants in languages with infixing and *c*el 
harmony ays teas. This leads. to the question of whether tba systea ia favored 
aa a linguistic device, and, if so, whether it is favored by virtue of the 
signer's aotor orfnixatioo for producing it. It night be favored, for 
example, if the mot »r or gaol sat too underlying sign production readily produced 
"yclio repetitions <>f a aoveaent (as teose uaonriylrg stapping, breathing, 
.chewing and perhaps vdwel production do)* 'and if aunlaal adjustment* to the 
organisation would enable shifts in location without changing the for* of the 
aoveaent, 



" A scan of the various- conference addcaasea shows the close ties between 
the event ' approach and Gibson • s theory o^ perception. Indeed, 

Gibson's radical ret hiding of classic perceptual problems includes the notion 
that a percei/er does run operate in a series of "froxao s»*ents,» but rather 
in , an ongoing It? «aa of events, we therefore thought it useful to examine the 
ecology of *ne,,^rpae^ event, a#S t in ooing so we were reminded that both the 
speaker and the listener (the signer -and the oO server) have a stake in the 
success of a eoaswntcatlve episode. This is a rather unique circumstance; it 
invites botVa familiar analyais of the perceiver aa an active seeker of 
.information (cr. Glb^son, * 19661, and a leaa familiar analyais of the producer 
as an active provider of informational support. 



As to the perceiver *s active role, we first of all see behavior intended 
u. enhance signal detection: the head can be rotated V an optimal orienta- 
tion , the source . can be epproeehbd, and ao.oo, Beyond this there sen be 
direct coamunleative intervention; that is, tba perceiver can make requests 
for repetition or clarification, On the producer's part, there are the weil- 
known redundancies of language: in .essence, more than enough information is 
provided to enoure the accuracy of communication. AI*o t perhaps to svoid 
syntactic ambiguities, the tslker bay provide careful proaodie marking for 
clause bouodsries and the like (e.g., Cooper A Psosia-Cooper, km 
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finally, * talker *ill wwoiiU, more cleerly (and a signer gesture more 
dletinotiy) *r*en there la a greet distance to the pcrcciver or when the 
mesaage contest makes ^particular word unpradictable, 

cofouaiaG uuaiuit, assuaca wow am nan psasPEcme 

If there U a theme to the event conference, It is surely that 
psychologists have paid t*o little attention to the ayataaatlc (and potential* 
ly informative) nature of change* tflth r**pect to speech, this can be saen in 
the g o m roon practice of decomposing the speech stress Into a succession of 
discrete ecouatic s eg m en ts (e.g., releaaa bursts, aspiration, foment transit 
tiers, and the U*e3u A tfccle literature ^eaka, in turn, of the difficulty 
in bringing these acoustic segments into eo*e correspondence with linguistic 
s egm e n t a. In the case of sign, the perceptual eignifioancc of change was 
over looted ih early attempts to devise sign glossaries; investigators Mire 
preoccupied irlth cataloguing the fee tor al properties of hand shapes and failed 
at flrat to recognize the importance of the gestures bein| aade with the hands 
(Clime 4 Bali ?i, 19?9* chapter 12 and passim; Seilml * Studdert-fennedy. 

The members of our group were agreed that a shift of emphasis la needed: 
investigators of both speech and eign should give greater consideration to the 
time-varying properties of those eventa. To begin with, this will involve 
focusing on the dynamics of the source events themselves. These Investiga- 
tions of the eotroe can suggest compatible and appropriate perceptual ana* 
ly**a. Some r«* a*.* uaing Johansson's poin^Hght techniques to atudy th* 
coordinated activities of the signer, and the perception of levioel movements 
and inflections fe«g*, Pointer, Bellugi, 4 lutee-briscoil , 1981), seams to 
offer promising beginnings for such en approach, 

Alternatively* analyses of time- varying properties of the signal may 
provide guidance in understanding the nays in uhlch *talk*rr end signers 
structure articalatory activity (of, Fowler, !979i Tulier A fouler, 1980) < 
On this iasue, our group apent a good deal of time considering the recect work 
of temez, fcubin, ^eonl, *>d Carreil (198*1 frames, flubin^ A Cerrell, 19&H. 
They have show that the phonatic message of Jk utterance can be p*eserve<5 in 
slnauave appro* tmetlons that reproduce only the center frequencies of Its 
first inree form ants, These stimuli have no short* time acoustic constituents 
that vocal tracts can produce and consequently ieok many acoustic elements 
neretofore identified by investigators as speech 4uee* Presumably the stimuli 
•re intelligible baceuse information is provided by relations maong the three 
siauscids. information that the sinusoidal variations e?# compatible with a 
*ocel origin, 

These findings #e Important not &*&euee they sno* abort* time asoustlo 
cuea to be unimportant to *t>e#oh perception, after all* naive listeners did 
not spontaneobsi y mm* •* , sin«i«tves ae phonetic events, Instep, th* 
findings ere important in shoving that tijee~*erying properties of the signal 
can provide suff iciest informatics fsr «ord end segment identification in 
speech* In this respect, as Hm*i and Hubln point out Ciote f), their 
dmmonstration is oioaeiy analogous to Joh#nser>V a demon stf at io^s ulih point- 
light displays c*f moving figure* I* Uutn d^ef^niraif ws , r;heng* provM** 
essentia} information for form. 



The conclusion wo draw from all of the examples considered here is that 
students of language Should not bo staled by the timeless quality of 
linguistic fonts* Signing and speaking are coherent activities and natural 
claeaes of events. It is only reasonable to eipect that the signatures of 
these events will be written in time as well as space. 

REFERENCE NOTE 

1. Remez, R. E., & Rubin, P. E. The stream of speech . Paper distributed at 
the First ^International Conference on Event Perception, Storrs, Ct., 
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FOOTNOTES 

1 We do not intend to suggest by the word conventional tyiat the linguistic 
aspects of ' utterances have been established by popular acclaim. We intend 
only to distinguish the linguistic aspects from the physical aspects in terns 
of their "relative arbitrariness." Let's consider a physical example 
first: the articulator y and aooustio differences between the version} of /d/ 
in /di/ and /du/ are necessary and lawful, given thr nature of vocal tracts. 
This contrasts with the aspiration difference between the versions of /p/ in 
"pie* and "spy," the production of which is required of English speaker* only 
by convention or rule. We know fchis to be She case since speakers of other 
languages; (e.g., French) make no such distinction. 

2 In Gibson 1 s view: r - - 

The relation of a perceptual stimiflus to its causal source 
in the environment is of one sort; the relation of a 
symbol to its referent is of . another sort. The former 
depends on the laws of physics and biology. The latter 
depend s on a linguistic community , which is a ml que 
invention of the human species. THe relation of perceptu- 
al stimuli to their 3 sources is an intrinsic relation such 
as one of projection, but the relation of -symbols to their 
referents is an extrinsic one of social agreement. The 
conventions of symbolic speech must be learned, but the 

. child can just about as- easily learn one language as 
another. The connections between stimuli and their 
sources may well be learned in part, but they make only 
one language, or better, they do not make a language at 

. all. The language code is cultural, traditional and 
arbitrary; the connection between stimuli "and sour.ces is 

; not (p. 91). 

3it is interesting in this regard jthat theories of perception developed 
within the information-processing framework have relied almost exclusively on 
verbal materials as stimuli and propose that perception is indirect. 



FRICATIVE-STOP COARTICULATION : ACOUSTIC AND PERCEPTUAL EVIDENCE 



Bruno H. Repp and Virginia A. Mann* 



Abstract . Eight native speakers of American English each produced 
10 tokens of all possible Ci, FCV, and VFCV ..utterances with V * [a] 
or Eu], F * [s] or tj-], and C * tt3 or Ik). Acoustic an/lysis 
showed that the formant transition onsets following the stop 
consonant release were systematically influenced by the preceding 
fricative, although there were large individual differences. In 



(e.g., /sWJ and VFCV (e.g., /asd^) utteranees; that is, they were 
not reduced when a syllable bounc^ry intervened between fricative 
and stop. In s parallel, pereeptuel study, the CV portions of these 
utteranoes (with release bursts removed to provoke errors) were 
-presented to listeners for identification of the stop eonsonant. 
The pattern of placa-ofydfUoulation confusions, too, revealed 
coartloulatory effects due to the excised frloative context. 



In two previous pspers (Mann & Repp, 1981; Repp A Kaon, 1981) we 
described an effect of a preceding fricative on stop consonant perception: 
When a stimulus ambiguous between [toj and (koj was preceded by a frloative 
noise appropriate for ts] (plus a brief silence appropriate for stop closure L 
listeners reported tskoJ a>re often than fstoJ. A preceding CJ] noise, on the 
other hand, had little effect on the. perceived place" of stop articulation. In 
a series of experiments, we eliminated several possible explanations of the 
contrasting effects of Is] and [J], such as a simple response bias, auditory 
contrast, or direct cues to stop place of articulation in the frloative noise. 
Ve concluded thst the perceptual context effect swat likely reflects 
listeners' expeotstlon of a coartiouUtory Interaction between a stop eonao- 
nsnt and a preceding fricetive— namely, a shift in place of stop consonant 
articulation towards that of the fricative. 



♦Also Bryn Mawr College. 
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In our second piper (Repp 4 Harm. *98H» we reported date met supporter 
this hypothesis* Starting with frlcetive-atop-vowel utterances obtained fro© 
a »ing 1 e speaker, ire examined listeners 1 stop consonant perception after the 
fricative noise and the stop release burst had been reaoved* The stops in 
these truncated CV syllables were acre often perceived as having a relatively 
forward place of articulation when the eictsed fricative had been taj than 
when it had been ijj. In addition, acoustic aeasursfaents of the saec stimuli 
showed that the onset frequency of the second foraeat Ifp) following the stop 
release was laired by about 100 H* in the context of fa), relative to 1)1 
context. A posvioie difference in Fj onset in the oppoaite direction *as also 
indicated, Thus, F 2 mr # onsets were aorc widely separated in Is] coote*: 
than in tjj contest— a pattern that is consistent with the hypothesised 
forward shift m place of stop articulation following [ah considering the 
weii«*nown fact that F 2 «nd Fj onsets are acre widely separated in U*i than 
in (k*L 

While thes^ data suggested that fricative-stop coarticuiation can oc ur f 
their generality wss uncertain. In the present paper, we report acoustic 
aeasureaents and supplementary perceptual tests using utterances collected 
fro* eight new spe««ftrs« 



ACOUSTIC >CASOg£>CiTa 



Method - 

Speakers . Four aaies kkk* LL. 9f 9 VC) and four feaaies cvk, sp, PP. 
FBB), ell native speakers of Aaencan English* ^ere enlisted. They included 
two senior phoneticians CM. LU 4 an etpericnced speecn scientist *F9j&, a 
gr#4uate student in phonetics CFP>* a^d four speakers with lull* ifcrmel 
training, > 



Taole i 
Tne S€i of Utterances Ua*£ 



UaJ 


<3f* 


r » . i 


C2u 


lk»] 




Uuj 


gu 


UtoJ 






Sty 


UkaJ 


ska 


J aru 1 


swu 


i$t*j 




* 5tu j 


shtu 


IfkaJ 




ijku} 


sngu 




*-*da 


! W$tu j 




iojkai 


asga 


> uSkuj 




SaitaJ 




irftuj 








[y)ku] 





s 



UttgfgkfrjK tha aiparieaatal utt*r*f*as l&sludad til posn&la caabi&a- 
tioos of m tp^Uai fo*a£ CCaJ, U). or absaot), a friea«*a da J, (Si* ^ 
abaattt), • atop (it] or [k)) f aad § final vo*ai {(a) or U 4 ) f vith tha 
ronrictloo that tha tuo vouela* if praaaot, ba tha um^ Tabla 1 liata tha 
i ©dividual utt#raM#a, both in phooauc nc tat loo and io tha apaillog in which 
tbay wsra read by tha aafrjaota, iota U>at tha atop oooaooaats, althoijg£ 
uaaapiratad la both ftv and VFCV oootait*, vara phoaologicaliy vcicalaaa in 
rCV uttaranoaa wfcara (bay vara part of a ayllabla-lottlal frloativa-atop 
oluatar, Jbwt phonological ly voioad in VFCV bttaraooaa wfcara thay vara in 
syllafclt-ioitial position.! Uma, thia aat of uttaraocas aaablad ua 10 aaaaaa 
o*t o*iy tha affaot of a prtoadiag frlcativa on atop articulation but also tha 
^ aaaaitirity of that /affaot to tha praaaoca of an istanraaing ayi labia 
bowdsry. 

Tan raodofcliad Hat* of the** uttaraocas wart fcypad oo a shadt of papar, 
Tha itata looiudad four .otbar uttaraacas ([sal. lja>3, lay], aod ijyj) whoa* 
aoalyaii wa will oot raport bara. Tha CV syllables (£taJ, [fc*2 t Uuh £icu]) 
tiara addad aftar apeaJtera tn aod had oaea reoorded: thus, CV data ware 
awai labia for alt apaalm*a ooly. 

iecordioi prooadure , Tha uttersacea mr* produced in a souac proof booth 
in front of a Stoure dynamo microphone ano recorded oa a Crow 800 tape 
rtsonter* Speaker* *er*>g&ved sanple pronunciation a by tha *ipafi*ani*r aad 
ware instructed to raad at ao eveo paoa eod as naturally aa possible* 
Speakers war lad to their esaigwwrat of stress in tha disyllabic CVFCV) 
uttereooes; Three (AA, LL f WO atraaaed tha seoor** syllaole while the otaer 
five ttraaaad tha Xirat syllable, UiSa u&ioteftded variation in stress offered 
tha oppottimity to observe any poeaible ef facta of this verie&le* 

***mwm*m% proe*<htr* , IiKliiridual uitara^oaa mrw input fro* a^dto tap« 
to a Fadaral fpaotrua ^alyaar, Tha ratal ta of tha spaotra; analysis 

vara atorad in tha aaattry toffar of a CT-%0 ooaputar aoo dtsplayad cm a 
ft**latt^ackar<3 oacilioao^pa* Py uau% a eyraon balow a apaotrograB of tha 
tffiola wttar«ca 4 individual %im trmmm coold be aalac* ?d wftoaa s^oothad 
avaratt tpaotroa «rat diaplayad abova tha apactrofraft, t^iila tha comnpotrtwi 
port 100 of tha digit! tad *nvafor» *pp**r*4 m * aaccod acraan* Thua« tha 
a#laoiloo of fraaaa for » apaotral aaalyaia^iwi fuldad by botB mvafora ai^d 
apaotrofraphio iaformatioo^ Spactrai aroaa^aaotioo^ **ra coaputad onar a 
gl>fe» wao Uaa fr«aa; tha atap alia fro» oqp frasa to tha oait tas 12.8 asac, 
Tha apaatha **$ diaplayad aa a point plot with a raaoluti<m of tit. 
Spa^tral paaka corrwpoodi&a to forunta wera dataoiload froa this display by 
aya aod noiad «fonn by' haod, Appropriate adjuataaota vara aado for asywatric 
aiutpda of formact paalta. oeoaaional aultipla paaka <^ua to ^ Tomaot straddli^a 
two or kc^# individual Rarsonlca nara awaragad. In <Jou£>tfal casas* 1** 
dpaotra ^of tha praoadthg m4 following ti»a fraaaa ware takan aa a g« idol in*, 

Sacawaa of tha lahoriovs eat4ira of this a^uai pro^^ur^, tha •aaaura* 
aaata had to &a rastrietad to tha iioat cruoiai aapaots Of %m atianii— tte 
mmt fr*$x*mi** of mo yj ca^d ia *e*a ^aaas. F^) foilowiRg atop 
ralaaaa* 5im* t^# ralaaaa our** of tha atop aaually s**o**d • nighlr 
irregular apaotrus Caapaolally for altraolar stops) » U *as ignorao. s^kj 
waauraaanta vara takae froa tha first fr«w? that shov^o ^ .iaar Nfoma^t 
pattara, noraall# lociwdl^g Ft (aignifyttig tn# onsat of #oi4i^gi R Mdititmai 





1.4 1.4 



1.t 1.i t.0 fj 



14 t.0 1«t 1.4 

F a 0tHri 

figure 1. forsuwt trweitloo patteraa for Individual ip* *#r»* produotiooe o 
ItAj, IkoJ, tt«). tod lieu), averaged over «W Jifftreet contest 
mi depicted. *« trtjeot^it. io the F*** p i«., fet« for £iwJ er 
9 eleetn* froa •peakere 41. LL, end PP; dut to usreUeble P 
■eeeweoeota. 
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•e'aaur events *ere taken from in* nett^two fraaes (only fro* the neit fra*e in 

the cist of speaker AA whoa* utterances 4*ere the first measured)* so that 
foraant transitions were tracked over approximately 50 tsec, t 

j *ote *hat this procedure provides « conservative estimate of coarucula- 
tdry effects dr* to the fricative, since any, such effects ere likely to be 
•ost iJronounced at the point of atop release end to decrease with» distance 
Jrcm the r*lea^, although coarticulatory changes in the release burst aay 
*exiat tcf, Repp 4 Mann, 198 1 ! , for indirect evidence) they cannot be assessed 
eatii* by the prtneni method. ^u$ s the present investigation was concerned 
solely wit* ^articulator, changes in the fcrmant transitions following the 
release hurst, 



The rai data consisted of the frequencies of F 2 ^ nd p 3 iBf ^ 9 ^times, 
f «) for three (two in the case of M) consecutive frames of each of ten tokens 
of 20 utterances C 16 in the case of VM and SP) produced by eight speakers* 
Hissing d*ty due to omissions, attspronuncietionj, cr .gross acoustic artom«Iies 
- were r^re* a more common source of* missing data was tfce weakness of some 
formants in certafh utterances* particularly F3 m utterances containing Ikul. 
For so** speakers, as noted below, ec reliable data for could be obtained 
in these instances, \ % 

H/iSults and Discussion 

The measurements of F^ ht>6 ln FCv and VFCV utterance- were subjected 
to separate St-wey analyses of ^riance. with the factors Syllable Boundary 
(FCV vs. VFCiVi, fricative Usl vs, tjl), Vowel {[0.3 vs, Cu3), Stop (U3 
vs. (k)) end Time O frames)* Speaker kk was not included in these analyses 
because of missing oat a, 

Figure 1 gives an impression of the general frequency characteristics of 
tfce formatit transitions, regardless of preceding context, Tfce transitions are 
depicted as trajectories in the F 2 ^ 3 plane# separately for each speakers 
productions of [tAj, ttv], and Cku3, averaged ove*; the five contexts: 

f -3, ts-3, tj~3. tas*3 ior [us~j), and t*5~3 {or [uj~3h Except Tor the few. 
cases with Kissing data points, ejich trajectory is based on three points in 
time, separated by i<*8 msec, with 50 measurements per point- In the left 
panel, «♦ can be seen that sil speakers had falling F^ transitions in both . 
Cw4 and tfcoj* but two different pattern emerged for F^ : y or fi ve speakers 
at, 8M, ¥C, SP, PP), the F3 transitions were failing for ItaJ and slightly 
falling for U«J; for the remaining three speakers (AA, VM, FBB), F3 was 
complete) y fiai for ItoJ but rising for UtmJ. These individual diuerences 
may indicate that ^he second group of speakers produced le*3 with a relatively 
high F^ t j n t ^ e rign* ^acel, we see that all speakers ^except for VM in [kuD 
showed falling F* trans.. ions in [tu3 but a f 1 t F3 in [ku3. Note that after 
about 50 msec of formant movement, the fortsanta of tto3 and [fcoJ, ant of [tu3 
and tkuj, were still widely ^eparat#d, suggesting rather long forsant transi- 
tionm and/or variations* ip vowel quality dependent on the preceding stop 
{particularly m [uj), 

Tbr trends shown in Figure 1 are all highly significant, and they are 
g*nerfcli* i* e§r*eo*At tf ith otH^r data in the Xltmrttur*. We win not dweii 
on ^ne« here, as our % priory concern was the effect .of preceding fricative 
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context. We examined this effect in terms of the difference* in formant onset 
fr \ a ta* following [s3 *nd [$]. 

ie 2 shows ttlese differences (in Hz)' for F 2t v -oken down by individual 
v an^e pairs and speaker^g^but averaged over th*. thre*, time frames. A 
>ui iive difference indicates that F 2 was higher following [s] than following 
i„3. Italics indicate differences that were significant at the j> < .01 level 
in individual t-tests. It can be seen that ( on the average, F 2 waa 4 Hz lower 
following [s3 than following [53— a nonsignificant lifference. Nevertheless, 
out of 64 individual comparisons, 20 were significant— a proportion far 
exceeding chance. Of these 20 differences, 8 were positive and 12 negative,- 
which confirms the absence of any general trend. Since there was no pattern 
in the data, these significant coarticulatory effects must be considered 
entirely idiosyncratic. 

In the analysis of variance, however, there was ^ significant triple 
interaction between Fricative, Stop, and Time, F(2,12) = 14.0, £< .001: The 

F 2 transitions of alveolar stops started an a erage of 40 Hz lower in [s3 
context than in [J3 context, and this difference diminished over ti,me. The F 2 
transition of velar stops, on the other hand, was essentially unaffected by 
fricative context. No other effect involving the Fricative factor was 
significant, except for one marginally significant 4-way interaction with no 
clear associated pattern. 

The F 3 measurements are shown in Table "5. The picture was quit* 
different here. On the average, Fo waa ifc Hz higher follow! rig [s3 than 1 
following tj3, F(1,6) = 51.8, £ < .001. 01* the 64 individual comparisons, 28 
were^ significant, and every single one of them was positive./ Thus, even 
though there was considerable variability across speakers and/ tokens, the 
evidence for coarticulatory variation in Fg i a very strong. We correlation 
bet', the entries in Tables 2 and 3 is -0.07, indicating no relation between 
con. .xt-induced shifts in F 2 and in F3. / 

The coarticulatory effect on F3 414 not decrease over time, suggesting 
that fricative context may have influence ' not only the articulation of the 
following atop but also that of the following vowel. Jwo interactions 
involving the Fricative factor reached significance in /the analysis of 
variance. One— between Fricative, Syllable Boundary, and Ti^ie, F(2,12) = 4.2, 
£ < .05— revealed that the coarticulatory effect increased over time in FCV 
utterances but did not change at al] over time in VFCV utterances. According 
to the second interaction- -^between Fricative, Vowel, Stop; and Time, F(2,12) = 
8.0, £ < .01—tbe coarticulajbory effect increased over ti^e in [u3 context and 
for . alveolar stops in [0.3 context, but decreased over ti^ae for velar stops in 

tOd context. The reasons for these complex patterns arr 'not clear. 

i 

" Table 4 shows the F4 measurements, which were obtained for only five 
speakers and yielded reliable data for only about half the comparisons (mostly 
those involving stops preceding [u3).2 Nevertheless, the pattern was very 
clear: Out of *9 individual comparisons, J8 were positive, and 13 of these 
we*e significant. Thus, there was a clear tendency for F4 to be higher 
♦following ta'3 than following [J3. This tendency /eemea to be even stronger 
tliap that for F'$ t the average difference in Table 4 being more than twice as 
large (102 Hz) than that in Table J. However, the changes in F3 and i n F4 
were not significantly correlated (r s 0.21). 
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Table 2 



Coarticulation Effects on F 2: [F23 s - [F23j in ,z. 



Utterances 



tst*]-[Jt«.] 

tsto.]-t;ka] 

[stu]-[Jtu] 

tsku]-tSku] 

[aaU.j-[aJtoJ 

[<wk©.]-[aJkoJ 

[ustu]-[u$tu] 

[usku]-[ujku] 

Mean 







fife 


Speakers 


SP 








AA 


LL 


VG 


VM 


PP 


FBB^ 


He an 


10 


-11 


221 


-65 


32 


-24 


-21 


62 


r? 


36 


-13 


1 


85 


52 


8 


0 


17 


23 


98 


5 


-64 


21 


-76 


-12 


-47 


-44 


-8 


4 


-20 


70 


7 


49 - 


■ 164 


-44 


-147 


-30 


4 


z2i 




-57 


-13 


15 


-3 


-4 


-20 


ill 


51 


3 


' -3 


137 


44 


-33 


40 ' 


46 


-22 


9 


-81 


=S1 


-15 


4 


21 


-71 


-30 


-10 


9 


-8 


-15 


-31 


-1 


33 




-8 


31 


-1 


-22 


-7 


17 


-16 


-12 


-2 a 


-4 



Note:, Underlines indicate difference is significant (jj < ,01) by t-test. 



Table 3 



Coarticulation Effects on F a . 



3: tFjJj- tF33 jr in Hz. 



Utterances 



tst*]- 
[sko.]- 
[stu]- 
[sku]- 
[fcsta] 

[ftsto] 

iustu] 
[usku] 

Hean 



tSt*] 

[Ske] 
Utu] 
tlku) 
-lajta] 
-Ulko.) 
-tu$tu] 
-[ujkul 









Speakers 




AA 


LL 


RH 


VG 


, VM 


3P 


-20 


101 


(54) 


•a 


43 


37 


86 


1 


76 


61 


-21 


64 


21 


(If) 


111 


67 


28 


M 






12 


(19) 


0 


71 




33 


-24 


21 


12 


21 


(65) 


8 


104 


40 


-55 


11 


108 


11 


15 


64 


88 


24 






25 


19) 


-29 


25 


60 


54 


48' 


50 


8 


43 



PP 

27 
29 
75 
112 
-1 

22 
125 
(46") 

.62 



FBB 

117 

51 

.-9 
11 
145 

55 
1 

55 



Hean 

50 
43 
6* 
(44) 
*3 
37 
6! 

(22) 
ft6 



Mote: Underlines indicate difference is significant {g < .01) by t-test. 

Differences in parentheses are baaed on a small nueber of tokens only. 
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Table * 



Coartioulatlon Effects on F 4: " £F*] 4 - tFn]^ 10 Hx. 



Utterances 

£etu]-titu] 
[»ku]-[5ku] 

[uatu]-Cu$tu] 
lusku]-[ujku] 



m 

35 

-1 
2i 



Speakers 

SP PP 



in m 

16 121 



m 

27 



100 36 83 260 
105 89 m 199 



FBb 

«7 

8« 



Rote: Underlines Indicate difference Is significant (j> < ,01) by t-test, 



Table 5 

Confusion Matrices for IVuncaUd Stops in [&J Cu] Context 







V 












V * UJ 






Utterance 




■th" 


*d* 


"g" 


*»* 




•in* 
% 


*es* 






E'iW) 


16 


13 


55 


10 


6 


6 


5 


80 


8 


1 


t(/)tf] 


16 


9 


52 


17 


6 


3 


6 


30 


9 




i<s)k¥l 


2* 


8 


21 


•i 


6 


63 




3 


19 


11 


t(X)k¥3 


26 


6 


u 


*6 


8 


70 


3 


2 


T« 


11 


C(Vs)t?3 


6 


13 


6* 


9 


8 


7 


3 


84 


5 




t(Vf)tf) 


9 


10 


6? 


12 


6 


3 


3 


87 


6 


i 


C(fs)kV) 


10 


10 


32 


12 


6 


52 


5 


5 


29 




CCTpkV) 


*« 


8 


30 


«1 


? 


62 • 


3 


4 


23 


6 
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A comparison o£ the F3 dttt fro* each fricat've contest *ith the 
•e**ure*mts for CV utterances did not eonfir* )c*r eipectetion (based on the 
earlier perceptual data) that the coartlcuiatory affect would fee primarily due 
to tah On the contrary, -the data suggest that It was aiaost entirely due to 
tjh However, this difference was m large aeasure due to a single subject 
(PP), and because this analysis could be done on five speakers' utterances 
o^yy^the effects did not reach statistical significance. 

\h recognUe that It Is difficult to Infer articulator? processes fro* 
acoustic data, CWen our hypothesis that the place of stop articulation 
shifts towards that of the preceding fricative (Repp 4 Hams, one ©ignt 

eipect that the forsent transitions of a stop following is] would be sore UJ~ 
like (indicating a forward shift) than those of a stop following C$ J * wmcr* 
would be acre Ui-like (indicating a backward shift). Since It] ha* a 
somewhat, higher F3 MMt tn#n [ k ] %n ^th vocalic contents Ccf. Figure 1). our 
finding of a higher F3 onset following Cs3 is consistent with tnjse e*pecta^ 
tlona. «hat is not consistent is (1) the absence of. any coartlcuiatory shifts 

*° F 2» particularly in t-u3 contett where [tj and tkj are characterized 
wic*2j< differing frequencies (cf* Figure H, and C2) the finding of higher F* 
onsets following tsL for our data indicate that F$ lB considerably higher i* 
Cku] than in UuJ, with less difference between Ik*] and lt*u* in vte*r of 
these a*biguiues* we turned to a perceptual test in the hope that it eight 
shed soee light on the direction of the shifts in stop place articulation. 

PERCEPTUAL DATA , 

To coapie&ent our acoustic seasureaeats* we gathered perceptual data for 
a subset of the t*iter*c#?ces described above, supposing that labeling response* 
to FCV and VFCV utterances fro© which the fricative noise £nd refuse burst 
had been r&tov&s eight provide another eeans of assessing sny coaf ticyiation 
between^fricative and stop— a procedure used successfully by fcepf and Mane 
{198U, we Dtgan by focusing only on those utterances thst lifted the 
vovel lOJ, Out later ** tended 0^ e*per latent to utterances containing I**}, 

getnog 

Subje cts^ The Subjects »er* ten st^ent% fros 8?yn Ki**r #n<j Haver foro 
Colleges, all native speakers" of English, of tf*o* eight were ps!d volunteers 
■ar.d two were par ticipeur.g as .part of a class project 

Stimuli . To create, the truncated v* syllables, the utterances *err 
d'gitiied at ^0 *h* using trie Raskins laboratories PCM system, individual 
utterances **re displayed on a storage osciU^cor^, em) the fcegmMng of tnr 
first clear pitch pulse foiicvittg the atop re it as* tmrst *ss locate \r\ the 
*a*efor«u Only the stimulus portion following that point retainer, The 
|urat doretlo* Cfro© durst ons#t to the cutoff poiftt) *** recorded Trus 
done <or five tokens of each of ail eight speakers* utterances containing th* 
*0*el UkJ* and for four apeekers' i AA a 14 s PP # FBB* utteruii*:** containing tn* 
vowel tuj.l * N 



The truncated CM syUa&les *er# as£e££led into sequences #n<j records 
onto audio tape, k separate tape *#s created for ejrn 5pe*k*r #^3 for ^#c* 
each tape containing S r#p*titio«a of each of tfcf ar« sue^i ^ totf*na 



0 



of aacti of 8 uU«r*ftc*s) l« saparataly rasdosiiad bios** Iritaratiaru^s 
interval mi 2,5 3#c> *Ufc ? ^ aa« o#t¥*an bloats* 

Pfocfgurt. Ail subjects participated in \«o different sessions o>f 
eppfoilaataly oc* ?¥>ur , Th# (oJ tapes for *peeieer* U t *H, e&* 3F *er# 
played la «ie first nnioo end tftose for speakers ?F f and FS8 #ere 
played la *se second Mllioa, in tn# order a* listed 5li of tfte subjects 
returned for * tblrd session ta *fcHb til of tut tu) tapes mr* played, 
stiauH *ars presented it* a qutet roe* over earphones, Subjects *efe 

required to label each stiauWa as ooGtelsiat » lsiuai *tft* c #s 

that), *d f * or, if necesaary, *»* <«h> aof»eoa#*tK 

ftaaulta agg Discission 

Ifoe data obtalf*ed *Uft speaker SP*a KM attar enoes #*re ##ei*ded fro* 
analysis because listeners found It dtffloult to near any stop* sad rerpo*d** 
fairly readonly. The combined eoafwaio** osiri* for the ra*aiaiag seven 
speakers' CO] utterances U sfeowa In xJk J aft naif of Tab la 5, Comparing 
utterances differing oaly in tfte nature of tne ori»i«ei fricative, it is 
evident tfcat *d* Ceod *tfs*# responses #ere aopaniiet aore frequent *fcei* tfc* 
fricative contest ned been UL and tnat *g* (and responses were #or# 

fre<£*eat *tien the fricative sonteit ned been Cj 3 * tscept for tn* trend I* tfce 
*b* responses, tfels patten* ia consistent vitn our hypotfiesjs tftet Csj leads 
to a forward sfeift in tf*e place of articulation of a following stop, 

Sespossea *f *d* end *g* *ere subjected to separate Mlay analyses of 
variance *ltn tha factors SpesJrer, Stop ffti *e UJ), Frioatl^a C{a) 
^t. and Sylia&is Bour^Jry (fCV fFCWK *te dlsoovarad ti*at 4 ^iie th* 

affaot of fricatita ^<mta*t ot* *d* raapoaws did not rear.fc signi ftca^oa, that 
on *i* rtapof>a*f did, FC1*9) * i**5» jg < ,0^ Howvar, tna a*t*nt of im* 
diffara^c^ var lad aoros« apaa4f#ra, Ff6*5*) * 83* £ « ^0* It *** 
j^atar f^/ al^raoiar stops thae fo^ ^raia^ onas, Fn„9i t d-i, £ ^ ,0^ a and 
graatar for FC¥ uttara^eo tr^an for VFW yttaran^a, FCt,9^ ^ ^i* £ < .O 1 - 
Sahara! otnar statistical l«t#raotiO*»s «#ar# ^sig^if ioant, indicating m& 
variability aao^g utta^aMaa prod^ad by 4lff#fa«i spaa^ ^ , &ut <rofisist#^ 
in svbja^ts* parcaptiofi, 

T6 aa# wr.atftar tfta ip#aM*f variability l« tha p«r^#ptyal data faiatad 
to tr.a aianar variability ofcssr*** ift w a^oystK *#a*ura»ams, ^ subtract- 
ad tha parcantagt of *g* rtaponsa* C^lch r*ad aN>^ a sifftlfs^afit tffret of 
rfUatiM co«taift? for aa^h attar afica tna^ ^ad ^f^tair##d C a 3 fro* tftat for tn* 
^&rra*po*di*it wttar^c* tnat ^ad oo^tai^ snd tr^n corralat^ tftasa 

siffarafMS* *for#^ valwas for asefe of ? spaa^ar*: ^ritb tna F^ di ffaranc^ 
saasyras of Ta*i* J iSa oorr aiatlOft #as poaitlva *«d significant. rc?^> - 
^*a 4 ^ ^ .Oi?. Tf.^s, pair* of uttaraticas sr^vl ng a f <*iati waiy ga a^oustK 
#ffae* &f fricativa contaat higr^- valuas of Fj f^uovins Uji 

iti^ad to aHoa a i*rg*r dlffar#n<a ^ *g* raspopsaa (v*r fw^ *g* 
rasponsaa t-o ^tiar#ncas tftat originally r^oiydad (sJ) / 

Tha confusion a*tn< for ma iui uttarancaa it i« tfta f ig^t ^aif c«f 

Tabitf v. Tha^ a saa tnat #1 vaoiaf stops i#»f> a^st of tan idantifiad *s 
bat tr^ieatad vaiar itops r*<nr^ad pradoni^a^tiy *b M ra^po^*a?i — a f indies tr *^* 
may 6a stplaiasd b/ tna #iailar;ty of tHa (a^aiiy aimaialy fof*af»t fw» - 

M4 <?f;#; 



uo** of a*$i*i %m nop§ \* f -l f :m%*t\ **f t**i*f~P*rv t ^ i ^ . 

iOf#t&#r tritn i po*ait>I* M*t#*#f bl»a to rtapona *&* in this oont»*t. T*i# 
t#bl# r#*#«ia Httl* iy*i#*#tio v«ri#lio* twtl*g#ftt cm tr*# irli#d fr*e*U*# 
oont#«t # #*s#pt for * tr#*f# o#t###n *g* rttpo^wt to ¥#i#r *tops 

UNm th# pr#^#4ing fr n*d ^fffl *b* r#§pem§#* *tr# l#M fftqutrtt, 

# g* r#spo*§#* *or# fr^Qutnt. th*fi it **<i k*#n T*»m# 4iff#r~ 

as f#fi««t#4 t*#* Stop &y Frio#u*a irat#r#0Uef* 3 *#ra significant in 
#ttaiya#9 of *o* rtsp^l*!, Fn,$} * 18,4, £ < .01, and of *g* 
r*§pons#S* FH^j # g * Ho*#**#r, tfc*r# *##r# * ni»fcM?r of 

aigflifioant ifttar action* mih otfcar faetom, atpaaiaHy vitfc Bp*ak*ra* f*« 
flatting again nig** &#t*a* ^tp#§5f#r *«riafcliity **upi#d *itn r*Jatl*#lsr 1^ 
5#ti#«#n-iift#n#f wiaOUSt/- Tn#f# *as to significant *orr*lat!Ofi *Hft tft# 
a^oyttio *t#$uf**#f»if for luj utt#r«AO0S* 



CQMCUIS10WS 

T # f#?^i?f of pr#s#ftt studlas* t**f* tnoygr* tn#|r ar# '*#4*d on * *#ry 
i#fg# Mftunt of 3aia, ar# ftot qMt* as olaar as #* fcad hoped. ***#r iftWiass, 
1*0 ttmoluafon* aa#* appropriate First* *a *ss# oHainad r«thar solid 
a<m*sti£ *¥tdar*£* for # ?>oaruoaiat**f y sftifl in stop p^o^ootio^ oont iftgant m 
pf#^#ding fr i$*u#a eontatt* TMs sMfi *aa rafi*£t#?i in gf^iliy Mgh#' 
erft##t *alv*s of Fj #fH$ f* following ( *1 tfcgf? following f j I , Saoond* ** ha** 
f&yfid acamor*} «?J<i#f*€H* for fr icatf *£~Ifctf*K*ad ***Jfis in stop prod^tioft 
I * stanar * 4 pareapuo* of tr*# v^alle fof##ftt if#f(altiof»» # although ift# oarr#~> 
IBlt&n mt***n tM #cm*stl<? iwd p#ro#pty#i fiftdiingi t^iS V#r t»bl 1 4 1 y of 

ootrti^vlttor r #ff#f*ti «ro?f sp^tie^^* £ofr#*s iras un#<p«ct941y l«rg# 

l>«fo^ i\m*t*lf< mi%n*f tfr* #co^*tlo w tM» p#re#ptu#l d#t# * airtight- 

for*r#ra #r tieulato^y ifttff pr#tnt iOf ? , *tel?h op*n lh# qu#ati&r> of *h#tfi#* 

of «op i^ui #t i*>f* ^#^J shifts toward that sf # pr^^#diftg 

?f*Z'jm&&lf* dl^i»et o?>iM»f »#tl^j$ of §p*#&h pro^fu^flo^ tight on 

1^ ^/f stud f #9 * m ftMj/datiftft f^r frii*i fMftr***^ 

r#*##r^^i^/ »%t»^i I *l^ing ff l^iti /^-<?t^ip lof 3 9^ « f^sl p***f^^r,^*i ]^ 

» f t It v l lit j O ft StOp ^OI5fOMSta tfcpjhi i*tl*4 iV^?Of*i 4i9Wrt 

C0»f/T5tT^' ~ " ^ 

t 4 A , $ i^pp. J* H fftfi-j*ft^# t»f pf*^^!li^g ff !<- I w ^* onitoniin* 

3Qu*r**i of trie Aeou§tic#i Soci^t^ of A#tr u#< 69- S^B # 

m~ 
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2*vtf§I* F* onaat/faraquanoiaa for fiva individual apaakara (baaad on « 
aubaat of tha uttaraiyfea) vara 2862 H? (|M), 3733 Hi (VH), 3962 Hi (SP). <303 



pracadlng frioatlva* ait analyaia of varlanoa waa conduottd on tha burst 
duration MaauraMnta* For tha teJ uttaranoas, thara waa no aignifioant 
affaot of Mia praeadlng frioativa, Burata vara* hovavar, algnifloantly longar 
for vaiar atopa (2* «aao) than for alvaolar onat (16 Mao), F(l f 7) * 39.2. £ < 
,001, Burata vara also algnifloantly longar following a ayllabla boundary, 
F(J,7) « 11.3» £ < .02, although tha dlffaranoa waa only 2 aaao. In tha [u] 
uttaranoaa, too, burata vara longar for volar atopa (24 Mao) than for 
alvaolar onaa (20 Mao), F(l,3) » 26.5, £ < .05, and burata tendad to ba 
longar following la! (?« aaao) than following [j: (20 moo), F(1,3) * 10.7. £ 
< .05, both af facta baing dua to unuaually ahort burata for alvaoiar atopa 
following la} (1? moo), Tha ayllabla boundary affact waa revaraad hara but 
nonalgnl f leant . * ' 




bur at duration contingent on 
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phonetic* auditory, trading relations 
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speech j>roce*sor 
acouatic cues 

arcay 
phonetic structure 
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Speech Artieulatior . 

cc&rticulatlon, lip rounding, ceaporal 

constraints 
obstruents dvaf apeakera 














Reading: 

prosody* orthography 
'*ewn;« linguistic, nonlinguiatlc, 
reading ability 
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Ecological Acoustics: 

auditory information, breaking, bouncing. 
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